Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Cortical encoding of speech enhances task-relevant acoustic information

A Publisher Correction to this article was published on 27 August 2019

This article has been updated


Speech is the most important signal in our auditory environment, and the processing of speech is highly dependent on context. However, it is unknown how contextual demands influence the neural encoding of speech. Here, we examine the context dependence of auditory cortical mechanisms for speech encoding at the level of the representation of fundamental acoustic features (spectrotemporal modulations) using model-based functional magnetic resonance imaging. We found that the performance of different tasks on identical speech sounds leads to neural enhancement of the acoustic features in the stimuli that are critically relevant to task performance. These task effects were observed at the earliest stages of auditory cortical processing, in line with interactive accounts of speech processing. Our work provides important insights into the mechanisms that underlie the processing of contextually relevant acoustic information within our rich and dynamic auditory environment.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Target-specific modulation profiles for the six individual target sounds.
Fig. 2: Activations evoked by speech sounds during the speaker and phoneme tasks.
Fig. 3: Probabilistic maps of the ROIs.
Fig. 4: Marginal modulation profiles of the MTFs during the speaker and phoneme tasks.
Fig. 5: Marginal modulation profiles of the task-difference MTFs.
Fig. 6: Dissociated spectral and temporal modulation profiles for the two tasks.
Fig. 7: Target classification accuracies obtained during task performance.

Data availability

The stimuli and the sound representations of the stimuli (feature matrix S) and the estimated fMRI responses (beta-weights) from a subset of the participants from this study are available as Supplementary Audio Files, Supplementary Data 1 and 2.

Code availability

The code that support the findings of this study is available from the corresponding author upon reasonable request.

Change history

  • 27 August 2019

    An amendment to this paper has been published and can be accessed via a link at the top of the paper.

    The original and corrected figures are shown in the accompanying Publisher Correction.


  1. 1.

    Belin, P., Fecteau, S. & Bedard, C. Thinking the voice: neural correlates of voice perception. Trends Cogn. Sci. 8, 129–135 (2004).

    Article  Google Scholar 

  2. 2.

    Leonard, M. K. & Chang, E. F. Dynamic speech representations in the human temporal lobe. Trends Cogn. Sci. 18, 472–479 (2014).

    Article  Google Scholar 

  3. 3.

    Davis, M. H. & Johnsrude, I. S. Hearing speech sounds: top-down influences on the interface between audition and speech perception. Hear. Res. 229, 132–147 (2007).

    Article  Google Scholar 

  4. 4.

    Leonard, M. K., Baud, M. O., Sjerps, M. J. & Chang, E. F. Perceptual restoration of masked speech in human cortex. Nat. Commun. 7, 13619 (2016).

    CAS  Article  Google Scholar 

  5. 5.

    Gaskell, M. G. & Marslen-Wilson, W. D. Integrating form and meaning: a distributed model of speech perception. Lang. Cogn. Process. 12, 613–656 (1997).

    Article  Google Scholar 

  6. 6.

    McClelland, J. L., Mirman, D. & Holt, L. L. Are there interactive processes in speech perception? Trends Cogn. Sci. 10, 363–369 (2006).

    Article  Google Scholar 

  7. 7.

    Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906 (2005).

    Article  Google Scholar 

  8. 8.

    Santoro, R. et al. Encoding of natural sounds at multiple spectral and temporal resolutions in the human auditory cortex. PLoS Comput. Biol. 10, e1003412 (2014).

    Article  Google Scholar 

  9. 9.

    Schonwiesner, M. & Zatorre, R. J. Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Proc. Natl Acad. Sci. USA 106, 14611–14616 (2009).

    Article  Google Scholar 

  10. 10.

    Theunissen, F. E., Sen, K. & Doupe, A. J. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J. Neurosci. 20, 2315–2331 (2000).

    CAS  Article  Google Scholar 

  11. 11.

    Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).

    CAS  Article  Google Scholar 

  12. 12.

    Atiani, S., Elhilali, M., David, S. V., Fritz, J. B. & Shamma, S. A. Task difficulty and performance induce diverse adaptive patterns in gain and shape of primary auditory cortical receptive fields. Neuron 61, 467–480 (2009).

    CAS  Article  Google Scholar 

  13. 13.

    David, S. V., Fritz, J. B. & Shamma, S. A. Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proc. Natl Acad. Sci. USA 109, 2144–2149 (2012).

    CAS  Article  Google Scholar 

  14. 14.

    Fritz, J., Elhilali, M. & Shamma, S. A. Differential dynamic plasticity of A1 receptive fields during multiple spectral tasks. J. Neurosci. 25, 7623–7635 (2005).

    CAS  Article  Google Scholar 

  15. 15.

    Fritz, J., Shamma, S., Elhilali, M. & Klein, D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat. Neurosci. 6, 1216–1223 (2003).

    CAS  Article  Google Scholar 

  16. 16.

    Golestani, N., Hervais-Adelman, A., Obleser, J. & Scott, S. K. Semantic versus perceptual interactions in neural processing of speech-in-noise. Neuroimage 79, 52–61 (2013).

    Article  Google Scholar 

  17. 17.

    von Kriegstein, K., Smith, D. R. R., Patterson, R. D., Kiebel, S. J. & Griffiths, T. D. How the human brain recognizes speech in the context of changing speakers. J. Neurosci. 30, 629–638 (2010).

    Article  Google Scholar 

  18. 18.

    Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012).

    CAS  Article  Google Scholar 

  19. 19.

    Holdgraf, C. R. et al. Rapid tuning shifts in human auditory cortex enhance speech intelligibility. Nat. Commun. 7, 13654 (2016).

    CAS  Article  Google Scholar 

  20. 20.

    Nourski, K. V., Steinschneider, M., Oya, H., Kawasaki, H. & Howard, M. A.III. Modulation of response patterns in human auditory cortex during a target detection task: an intracranial electrophysiology study. Int. J. Psychophysiol. 95, 191–201 (2015).

    Article  Google Scholar 

  21. 21.

    Nourski, K. V., Steinschneider, M., Rhone, A. E. & Howard, M. A.III. Intracranial electrophysiology of auditory selective attention associated with speech classification tasks. Front. Hum. Neurosci. 10, 691 (2016).

    Article  Google Scholar 

  22. 22.

    Steinschneider, M. et al. Differential activation of human core, non-core and auditory-related cortex during speech categorization tasks as revealed by intracranial recordings. Front. Neurosci. 8, 240 (2014).

    Article  Google Scholar 

  23. 23.

    Bonte, M., Hausfeld, L., Scharke, W., Valente, G. & Formisano, E. Task-dependent decoding of speaker and vowel identity from auditory cortical response patterns. J. Neurosci. 34, 4548–4557 (2014).

    CAS  Article  Google Scholar 

  24. 24.

    Formisano, E., De Martino, F., Bonte, M. & Goebel, R. “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322, 970–973 (2008).

    CAS  Article  Google Scholar 

  25. 25.

    Kilian-Hutten, N., Valente, G., Vroomen, J. & Formisano, E. Auditory cortex encodes the perceptual interpretation of ambiguous sound. J. Neurosci. 31, 1715–1720 (2011).

    Article  Google Scholar 

  26. 26.

    Ley, A. et al. Learning of new sound categories shapes neural response patterns in human auditory cortex. J. Neurosci. 32, 13273–13280 (2012).

    CAS  Article  Google Scholar 

  27. 27.

    Kay, K. N., Naselaris, T., Prenger, R. J. & Gallant, J. L. Identifying natural images from human brain activity. Nature 452, 352–355 (2008).

    CAS  Article  Google Scholar 

  28. 28.

    Miyawaki, Y. et al. Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron 60, 915–929 (2008).

    CAS  Article  Google Scholar 

  29. 29.

    Moerel, M., De Martino, F. & Formisano, E. Processing of natural sounds in human auditory cortex: tonotopy, spectral tuning, and relation to voice sensitivity. J. Neurosci. 32, 14205–14216 (2012).

    CAS  Article  Google Scholar 

  30. 30.

    Santoro, R. et al. Reconstructing the spectrotemporal modulations of real-life sounds from fMRI response patterns. Proc. Natl Acad. Sci. USA 10, e1003412 (2017).

    Google Scholar 

  31. 31.

    Baumann, O. & Belin, P. Perceptual scaling of voice identity: common dimensions for different vowels and speakers. Psychol. Res. 74, 110–120 (2010).

    Article  Google Scholar 

  32. 32.

    Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Phoneme representation and classification in primary auditory cortex. J. Acoust. Soc. Am. 123, 899–909 (2008).

    Article  Google Scholar 

  33. 33.

    Chi, T., Gao, Y., Guyton, M. C., Ru, P. & Shamma, S. Spectro-temporal modulation transfer functions and speech intelligibility. J. Acoust. Soc. Am. 106, 2719–2732 (1999).

    CAS  Article  Google Scholar 

  34. 34.

    Saenz, M. & Langers, D. R. Tonotopic mapping of human auditory cortex. Hear. Res. 307, 42–52 (2014).

    Article  Google Scholar 

  35. 35.

    Fritz, J., Elhilali, M. & Shamma, S. A. Adaptive changes in cortical receptive fields induced by attention to complex sounds. J. Neurophysiol. 98, 2337–2346 (2007).

    Article  Google Scholar 

  36. 36.

    Yin, P., Fritz, J. B. & Shamma, S. A. Rapid spectrotemporal plasticity in primary auditory cortex during behavior. J. Neurosci. 34, 4396–4408 (2014).

    CAS  Article  Google Scholar 

  37. 37.

    Anton-Erxleben, K., Stephan, V. M. & Treue, S. Attention reshapes center-surround receptive field structure in macaque cortical area MT. Cereb. Cortex 19, 2466–2478 (2009).

    Article  Google Scholar 

  38. 38.

    Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J. & Ekelid, M. Speech recognition with primarily temporal cues. Science 270, 303–304 (1995).

    CAS  Article  Google Scholar 

  39. 39.

    Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007).

    CAS  Article  Google Scholar 

  40. 40.

    Liebenthal, E., Binder, J. R., Spitzer, S. M., Possing, E. T. & Medler, D. A. Neural substrates of phonemic perception. Cereb. Cortex 15, 1621–1631 (2005).

    Article  Google Scholar 

  41. 41.

    Ahissar, M., Nahum, M., Nelken, I. & Hochstein, S. Reverse hierarchies and sensory learning. Phil. Trans. R. Soc. Lond. B 364, 285–299 (2009).

    Article  Google Scholar 

  42. 42.

    Giraud, A. L. & Poeppel, D. in The Human Auditory Cortex, chapter 9 225–260 (eds Poeppel, D. et al.) (Springer-Verlag, 2012).

  43. 43.

    Moore, B. C. J. An Introduction to the Psychology of Hearing 4th edn (Academic, 1997).

  44. 44.

    Griffiths, T. D. & Warren, J. D. The planum temporale as a computational hub. Trends Neurosci. 25, 348–353 (2002).

    CAS  Article  Google Scholar 

  45. 45.

    Formisano, E. et al. Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron 40, 859–869 (2003).

    CAS  Article  Google Scholar 

  46. 46.

    De Angelis, V. et al. Cortical processing of pitch: model-based encoding and decoding of auditory fMRI responses to real-life sounds. Neuroimage 180, 291–300 (2017).

    Article  Google Scholar 

  47. 47.

    Griffiths, T. D. & Hall, D. A. Mapping pitch representation in neural ensembles with fMRI. J. Neurosci. 32, 13343–13347 (2012).

    CAS  Article  Google Scholar 

  48. 48.

    Zatorre, R. J., Evans, A. C., Meyer, E. & Gjedde, A. Lateralization of phonetic and pitch discrimination in speech processing. Science 256, 846–849 (1992).

    CAS  Article  Google Scholar 

  49. 49.

    Bitterman, Y., Mukamel, R., Malach, R., Fried, I. & Nelken, I. Ultra-fine frequency tuning revealed in single neurons of human auditory cortex. Nature 451, 197–201 (2008).

    CAS  Article  Google Scholar 

  50. 50.

    Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012).

    CAS  Article  Google Scholar 

  51. 51.

    Da Costa, S., van der Zwaag, W., Miller, L. M., Clarke, S. & Saenz, M. Tuning in to sound: frequency-selective attentional filter in human primary auditory cortex. J. Neurosci. 33, 1858–1863 (2013).

    Article  Google Scholar 

  52. 52.

    De Martino, F. et al. Frequency preference and attention effects across cortical depths in the human primary auditory cortex. Proc. Natl Acad. Sci. USA 112, 16036–16041 (2015).

    Article  Google Scholar 

  53. 53.

    Marques, J. P. et al. MP2RAGE, a self bias-field corrected sequence for improved segmentation and T1-mapping at high field. Neuroimage 49, 1271–1281 (2010).

    Article  Google Scholar 

  54. 54.

    Gallichan, D., Marques, J. P. & Gruetter, R. Retrospective correction of involuntary microscopic head movement using highly accelerated fat image navigators (3D FatNavs) at 7T. Magn. Reson. Med. 75, 1030–1039 (2016).

    CAS  Article  Google Scholar 

  55. 55.

    Goebel, R., Esposito, F. & Formisano, E. Analysis of functional image analysis contest (FIAC) data with brainvoyager QX: from single-subject to cortically aligned group general linear model analysis and self-organizing group independent component analysis. Hum. Brain Mapp. 27, 392–401 (2006).

    Article  Google Scholar 

  56. 56.

    Kim, J. J. et al. An MRI-based parcellation method for the temporal lobe. Neuroimage 11, 271–288 (2000).

    Article  Google Scholar 

  57. 57.

    Bishop, C. Pattern Recognition and Machine Learning (Springer, 2006).

  58. 58.

    Golub, G., Heath, M. & Wahba, G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215–223 (1979).

    Article  Google Scholar 

  59. 59.

    Menke, J. & Martinez, T. Using permutations instead of Student’s t distribution for p-values in paired-difference algorithm comparisons. In Proc. IEEE International Joint Conference on Neural Networks 2, 1331–1335 (2004).

  60. 60.

    Forman, S. D. et al. Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): use of a cluster-size threshold. Magn. Reson. Med. 33, 636–647 (1995).

    CAS  Article  Google Scholar 

Download references


We thank the staff at the Center for Biomedical Imaging EPFL, Vaud, Switzerland for access to the imaging platform, and W. van der Zwaag for facilitating data collection; J. Gonzalez for helping with auditory recording; F. Zay for reading the stimuli; C. Türk for assisting during data collection and L. Ermacora for the phonetic segmentation of the stimuli; F. De Martino for providing code for analysing the data; V. de Angelis and N. Disbergen for helping with data analysis; G. Valente for helping with the statistical analysis and D. Gallichan for motion correction of the anatomical images. This work was supported by the Swiss National Science Foundation (grant numbers PP00P3_133701, PP00P3_163756 and 100014_182381 awarded to N.G.) and the University of Geneva Language and Communication Research Network. E.F. was supported by The Netherlands Organisation for Scientific Research (VICI grant number 453-12-002) and the Dutch Province of Limburg. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information




All authors contributed to the conception and design of the experiment. N.G. and E.F. supervised the study. S.R. created the behavioural task and stimuli, programmed the fMRI experiment, collected, analysed (including writing code) and interpreted the data, and wrote the manuscript. R.S. helped to program the fMRI experiment and to analyse the data (including writing code for it). A.H.-A. helped to create the stimuli and to implement the behavioural task. E.F. supervised the data analysis (including writing code for and implementing it), guided data interpretation and helped write the manuscript. N.G. helped to create the stimuli, to guide the data analysis and interpretation and to write the manuscript.

Corresponding author

Correspondence to Sanne Rutten.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information: Primary Handling Editor: Mary Elizabeth Sutherland.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–16, Supplementary Tables 1 and 2, Supplementary Results 1 and 2, Supplementary Methods 1–7.

Reporting Summary

Supplementary Audio Files

Audio files of the stimuli used in the paper (for a complete description of each file, see Supplementary Information guide).

Supplementary Data 1

Feature matrix S that was obtained from the stimuli (for more information, see Supplementary Information guide).

Supplementary Data 2

Beta-weights that represent the fMRI responses to individual speech sounds for an example ROI (for more information, see Supplementary Information guide).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rutten, S., Santoro, R., Hervais-Adelman, A. et al. Cortical encoding of speech enhances task-relevant acoustic information. Nat Hum Behav 3, 974–987 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing