Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex

Abstract

The precise role of the human auditory cortex in representing speech sounds and transforming them to meaning is not yet fully understood. Here we used intracranial recordings from the auditory cortex of neurosurgical patients as they listened to natural speech. We found an explicit, temporally ordered and anatomically distributed neural encoding of multiple linguistic features, including phonetic, prelexical phonotactics, word frequency, and lexical–phonological and lexical–semantic information. Grouping neural sites on the basis of their encoded linguistic features revealed a hierarchical pattern, with distinct representations of prelexical and postlexical features distributed across various auditory areas. While sites with longer response latencies and greater distance from the primary auditory cortex encoded higher-level linguistic features, the encoding of lower-level features was preserved and not discarded. Our study reveals a cumulative mapping of sound to meaning and provides empirical evidence for validating neurolinguistic and psycholinguistic models of spoken word recognition that preserve the acoustic variations in speech.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Coverage and paradigm.
Fig. 2: Diversity in linguistic feature encoding.
Fig. 3: Temporal profile of linguistic feature encoding.
Fig. 4: Spatial profile of linguistic feature encoding.

Similar content being viewed by others

Data availability

Linguistic features were extracted from the SUBTLEX-US word frequency dataset59 and the English Lexicon Project website (https://elexicon.wustl.edu/). The data that support the findings of this study are available upon request from the corresponding author (N.M.). The data are shared upon request due to the sensitive nature of human patient data.

References

  1. Chomsky, N. & Halle, M. The Sound Pattern of English (Harper & Row, 1968).

  2. Vitevitch, M. S. & Luce, P. A. Probabilistic phonotactics and neighborhood activation in spoken word recognition. J. Mem. Lang. 40, 374–408 (1999).

    Article  Google Scholar 

  3. Kiparsky, P. Word-formation and the lexicon. In Mid-America Linguistics Conference 3–29 (Mid-America Linguistics Conference, University of Kansas, Kansas, 1982).

  4. Luce, P. A. & Pisoni, D. B. Recognizing spoken words: the neighborhood activation model. Ear Hear. 19, 1–36 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Buchanan, L., Westbury, C. & Burgess, C. Characterizing semantic space: neighborhood effects in word recognition. Psychon. Bull. Rev. 8, 531–544 (2001).

    Article  CAS  PubMed  Google Scholar 

  6. Grosjean, F. Spoken word recognition processes and the gating paradigm. Percept. Psychophys. 28, 267–283 (1980).

    Article  CAS  PubMed  Google Scholar 

  7. Marslen-Wilson, W. D. Speech shadowing and speech comprehension. Speech Commun. 4, 55–73 (1985).

    Article  Google Scholar 

  8. Marslen-Wilson, W. D. Functional parallelism in spoken word-recognition. Cognition 25, 71–102 (1987).

    Article  CAS  PubMed  Google Scholar 

  9. Allopenna, P. D., Magnuson, J. S. & Tanenhaus, M. K. Tracking the time course of spoken word recognition using eye movements: evidence for continuous mapping models. J. Mem. Lang. 38, 419–439 (1998).

    Article  Google Scholar 

  10. Dahan, D. & Magnuson, J. S. in Handbook of Psycholinguistics (eds Traxler, M. J. & Gernsbacher, M. A.) 249–283 (Elsevier, 2006); https://doi.org/10.1016/B978-012369374-7/50009-2

  11. Magnuson, J. S., Mirman, D. & Harris, H. D. in The Cambridge Handbook of Psycholinguistics (eds Spivey, M. et al.) 76–103 (Cambridge Univ. Press, 2012); https://doi.org/10.1017/cbo9781139029377.008

  12. Pisoni, D. B. & McLennan, C. T. in Neurobiology of Language (eds Hickok, G. & Small, S. L.) 239–253 (Elsevier, 2015); https://doi.org/10.1016/B978-0-12-407794-2.00020-1

  13. Bidelman, G. M., Moreno, S. & Alain, C. Tracing the emergence of categorical speech perception in the human auditory system. NeuroImage 79, 201–212 (2013).

    Article  PubMed  Google Scholar 

  14. Fernald, A., Swingley, D. & Pinto, J. P. When half a word is enough: infants can recognize spoken words using partial phonetic information. Child Dev. 72, 1003–1015 (2001).

    Article  CAS  PubMed  Google Scholar 

  15. Magnuson, J. S., Dixon, J. A., Tanenhaus, M. K. & Aslin, R. N. The dynamics of lexical competition during spoken word recognition. Cogn. Sci. 31, 133–156 (2007).

    Article  PubMed  Google Scholar 

  16. Yee, E. & Sedivy, J. C. Eye movements to pictures reveal transient semantic activation during spoken word recognition. J. Exp. Psychol. Learn. Mem. Cogn. 32, 1–14 (2006).

    Article  PubMed  Google Scholar 

  17. Tyler, L. K., Voice, J. K. & Moss, H. E. The interaction of meaning and sound in spoken word recognition. Psychon. Bull. Rev. 7, 320–326 (2000).

    Article  CAS  PubMed  Google Scholar 

  18. Mirman, D. & Magnuson, J. S. Dynamics of activation of semantically similar concepts during spoken word recognition. Mem. Cogn. 37, 1026–1039 (2009). 2009 37:7.

    Article  Google Scholar 

  19. McClelland, J. L. & Elman, J. L. The TRACE model of speech perception. Cogn. Psychol. 18, 1–86 (1986).

    Article  CAS  PubMed  Google Scholar 

  20. Scharenborg, O. Modeling the use of durational information in human spoken-word recognition. J. Acoust. Soc. Am. 127, 3758–3770 (2010).

    Article  PubMed  Google Scholar 

  21. Norris, D. Shortlist: a connectionist model of continuous speech recognition. Cognition 52, 189–234 (1994).

    Article  Google Scholar 

  22. Scharenborg, O., Norris, D., ten Bosch, L. & McQueen, J. M. How should a speech recognizer work? Cogn. Sci. 29, 867–918 (2005).

    Article  PubMed  Google Scholar 

  23. Luce, P. A., Goldinger, S. D., Auer, E. T. & Vitevitch, M. S. Phonetic priming, neighborhood activation, and PARSYN. Percept. Psychophys. 62, 615–625 (2000).

    Article  CAS  PubMed  Google Scholar 

  24. Gaskell, M. G. & Marslen-Wilson, W. D. Integrating form and meaning: a distributed model of speech perception. Lang. Cogn. Process. 12, 613–656 (1997).

    Article  Google Scholar 

  25. Norris, D. in Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives (ed. Altmann, G. T. M.) 87–104 (MIT, 1990).

  26. DeWitt, I. & Rauschecker, J. P. Phoneme and word recognition in the auditory ventral stream. Proc. Natl Acad. Sci. USA 109, E505–E514 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Poeppel, D. The neuroanatomic and neurophysiological infrastructure for speech and language. Curr. Opin. Neurobiol. 28, 142–149 (2014).

    Article  CAS  PubMed  Google Scholar 

  28. Price, C. J. The anatomy of language: a review of 100 fMRI studies published in 2009. Ann. N. Y. Acad. Sci. 1191, 62–88 (2010).

    Article  PubMed  Google Scholar 

  29. de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L. & Theunissen, F. E. The hierarchical cortical organization of human speech processing. J. Neurosci. 37, 6539–6557 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Langers, D. R., Backes, W. H. & van Dijk, P. Spectrotemporal features of the auditory cortex: the activation in response to dynamic ripples. NeuroImage 20, 265–275 (2003).

    Article  PubMed  Google Scholar 

  31. Chan, A. M. et al. Speech-specific tuning of neurons in human superior temporal gyrus. Cereb. Cortex 24, 2679–2693 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Chang, E. F. et al. Categorical speech representation in human superior temporal gyrus. Nat. Neurosci. 13, 1428–1432 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Phoneme representation and classification in primary auditory cortex. J. Acoust. Soc. Am. 123, 899–909 (2008).

    Article  PubMed  Google Scholar 

  34. Steinschneider, M. et al. Differential activation of human core, non-core and auditory-related cortex during speech categorization tasks as revealed by intracranial recordings. Front. Neurosci 8, 240 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Ding, N., Melloni, L., Zhang, H., Tian, X. & Poeppel, D. Cortical tracking of hierarchical linguistic structures in connected speech. Nat. Neurosci. 19, 158–164 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Honey, C. J. et al. Slow cortical dynamics and the accumulation of information over long timescales. Neuron 76, 423–434 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Leonard, M. K., Bouchard, K. E., Tang, C. & Chang, E. F. Dynamic encoding of speech sequence probability in human temporal cortex. J. Neurosci. 35, 7203–7214 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Lerner, Y., Honey, C. J., Silbert, L. J. & Hasson, U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Overath, T., McDermott, J. H., Zarate, J. M. & Poeppel, D. The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts. Nat. Neurosci. 18, 903–911 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Caramazza, A., Berndt, R. S. & Basili, A. G. The selective impairment of phonological processing: a case study. Brain Lang. 18, 128–174 (1983).

    Article  CAS  PubMed  Google Scholar 

  41. Engelien, A. et al. The neural correlates of ‘deaf-hearing’ in man: conscious sensory awareness enabled by attentional modulation. Brain 123, 532–545 (2000).

    Article  PubMed  Google Scholar 

  42. Auerbach, S. H., Allard, T., Naeser, M., Alexander, M. P. & Albert, M. L. Pure word deafness: analysis of a case with bilateral lesions and a defect at the prephonemic level. Brain 105, 271–300 (1982).

    Article  CAS  PubMed  Google Scholar 

  43. Wang, E., Peach, R. K., Xu, Y., Schneck, M. & Manry, C. II Perception of dynamic acoustic patterns by an individual with unilateral verbal auditory agnosia. Brain Lang. 73, 442–455 (2000).

    Article  CAS  PubMed  Google Scholar 

  44. Poeppel, D. Pure word deafness and the bilateral processing of the speech code. Cogn. Sci. 25, 679–693 (2001).

    Article  Google Scholar 

  45. Franklin, S., Turner, J., Ralph, M. A. L., Morris, J. & Bailey, P. J. A distinctive case of word meaning deafness? Cogn. Neuropsychol. 13, 1139–1162 (1996).

    Article  Google Scholar 

  46. Boatman, D. et al. Transcortical sensory aphasia: revisited and revised. Brain 123, 1634–1642 (2000).

    Article  PubMed  Google Scholar 

  47. Kohn, S. E. & Friedman, R. B. Word-meaning deafness: a phonological–semantic dissociation. Cogn. Neuropsychol. 3, 291–308 (1986).

    Article  Google Scholar 

  48. Rauschecker, J. P. & Scott, S. K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat. Neurosci. 12, 718–724 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Rauschecker, J. P. in The Senses: A Comprehensive Reference (ed. Fritzsch, B.) 791–811(Elsevier, 2020).

  50. Gaskell, M. G. & Marslen-Wilson, W. D. Representation and competition in the perception of spoken words. Cogn. Psychol. 45, 220–266 (2002).

    Article  PubMed  Google Scholar 

  51. Ray, S. & Maunsell, J. H. R. Different origins of gamma rhythm and high-gamma activity in macaque visual cortex. PLoS Biol. 9, e1000610 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Buzsáki, G., Anastassiou, C. A. & Koch, C. The origin of extracellular fields and currents—EEG, ECoG, LFP and spikes. Nat. Rev. Neurosci. 13, 407–420 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Clarke, S. & Morosan, P. in The Human Auditory Cortex (eds Poeppel, D., Overath, T., Popper, A. & Fay, R.) 11–38 (Springer, 2012).

  54. Norman-Haignere, S. V. & McDermott, J. H. Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex. PLoS Biol. 16, e2005127 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Baumann, S., Petkov, C. I. & Griffiths, T. D. A unified framework for the organization of the primate auditory cortex. Front. Syst. Neurosci. 0, 11 (2013).

    Google Scholar 

  56. Shaoul, C. & Westbury, C. Exploring lexical co-occurrence space using HiDEx. Behav. Res. Methods 42, 393–413 (2010).

    Article  PubMed  Google Scholar 

  57. Ladefoged, P. & Johnson, K. A Course in Phonetics (Wadsworth Publishing Company,2010).

  58. Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Brysbaert, M. & New, B. Moving beyond Kučera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav. Res. Methods 41, 977–990 (2009).

    Article  PubMed  Google Scholar 

  60. Ylinen, S. et al. Predictive coding of phonological rules in auditory cortex: a mismatch negativity study. Brain Lang. 162, 72–80 (2016).

    Article  PubMed  Google Scholar 

  61. Friston, K. The free-energy principle: a rough guide to the brain? Trends Cogn. Sci. 13, 293–301 (2009).

    Article  PubMed  Google Scholar 

  62. Gagnepain, P., Henson, R. N. & Davis, M. H. Temporal predictive codes for spoken words in auditory cortex. Curr. Biol. 22, 615–621 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Brodbeck, C., Hong, L. E. & Simon, J. Z. Rapid transformation from auditory to linguistic representations of continuous speech. Curr. Biol. 28, 3976–3983 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Dahan, D., Magnuson, J. S. & Tanenhaus, M. K. Time course of frequency effects in spoken-word recognition: evidence from eye movements. Cogn. Psychol. 42, 317–367 (2001).

    Article  CAS  PubMed  Google Scholar 

  65. Marslen-Wilson, W. D. & Welsh, A. Processing interactions and lexical access during word recognition in continuous speech. Cogn. Psychol. 10, 29–63 (1978).

    Article  Google Scholar 

  66. Balling, L. W. & Baayen, R. H. Probability and surprisal in auditory comprehension of morphologically complex words. Cognition 125, 80–106 (2012).

    Article  PubMed  Google Scholar 

  67. Wurm, L. H., Ernestus, M., Schreuder, R. & Baayen, R. H. Dynamics of the auditory comprehension of prefixed words. Ment. Lex. 1, 125–146 (2006).

    Article  Google Scholar 

  68. Balota, D. A. et al. The English Lexicon Project. Behav. Res. Methods 39, 445–459 (2007).

    Article  PubMed  Google Scholar 

  69. Danguecan, A. N. & Buchanan, L. Semantic neighborhood effects for abstract versus concrete words. Front. Psychol. 7, 1034 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Mirman, D. & Magnuson, J. S. The impact of semantic neighborhood density on semantic access. In Proc. 28th Annual Conference of the Cognitive Science Society (eds Sun, R. & Miyake, N.) 1823–1828 (2006).

  71. Broderick, M. P., Anderson, A. J., di Liberto, G. M., Crosse, M. J. & Lalor, E. C. Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech. Curr. Biol. 28, 803–809.e3 (2018).

    Article  CAS  PubMed  Google Scholar 

  72. di Liberto, G. M., O’Sullivan, J. A. & Lalor, E. C. Low-frequency cortical entrainment to speech reflects phoneme-level processing. Curr. Biol. 25, 2457–2465 (2015).

    Article  PubMed  Google Scholar 

  73. Di Liberto, G. M., Wong, D., Melnik, G. A. & de Cheveigné, A. Low-frequency cortical responses to natural speech reflect probabilistic phonotactics. NeuroImage 196, 237–247 (2019).

    Article  PubMed  Google Scholar 

  74. Yang, X.-B., Wang, K. & Shamma, S. A. Auditory representations of acoustic signals. IEEE Trans. Inf. Theory 38, 824–839 (1992).

    Article  Google Scholar 

  75. Kluender, K. R., Coady, J. A. & Kiefte, M. Sensitivity to change in perception of speech. Speech Commun. 41, 59–69 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  76. Daube, C., Ince, R. A. A. & Gross, J. Simple acoustic features can explain phoneme-based predictions of cortical responses to speech. Curr. Biol. 29, 1924–1937 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Fischl, B. et al. Automatically parcellating the human cerebral cortex. Cereb. Cortex 14, 11–22 (2004).

    Article  PubMed  Google Scholar 

  78. Pisoni, D. B. in Talker Variability in Speech Processing (eds Johnson, K. & Mullennix, J. W.) 9–32 (Morgan Kaufmann, 1997).

  79. Luce, P. A. & McLennan, C. T. in The Handbook of Speech Perception (eds Pisoni, D. B. & Remez, R. E.) 590–609 (Blackwell, 2008); https://doi.org/10.1002/9780470757024.ch24

  80. Port, R. F. Rich memory and distributed phonology. Lang. Sci. 32, 43–55 (2010).

    Article  Google Scholar 

  81. Nosofsky, R. M. Attention, similarity, and the identification–categorization relationship. J. Exp. Psychol. Gen. 115, 39–57 (1986).

    Article  CAS  PubMed  Google Scholar 

  82. Kruschke, J. K. ALCOVE: an exemplar-based connectionist model of category learning. Psychol. Rev. 99, 22–44 (1992).

    Article  CAS  PubMed  Google Scholar 

  83. Magnuson, J. S., Nusbaum, H. C., Akahane-Yamada, R. & Saltzman, D. Talker familiarity and the accommodation of talker variability. Atten. Percept. Psychophys. 83, 1842–1860 (2021).

    Article  PubMed  Google Scholar 

  84. McLaughlin, D., Dougherty, S., Lember, R. & Perrachione, T. Episodic memory for words enhances the language familiarity effect in talker identification. In Proc. 18th International Congress of Phonetic Sciences (ed. The Scottish Consortium for ICPhS 2015) 367.1-4 (University of Glasgow, Glasgow, 2015).

  85. Choi, J. Y., Hu, E. R. & Perrachione, T. K. Varying acoustic–phonemic ambiguity reveals that talker normalization is obligatory in speech processing. Atten. Percept. Psychophys. 80, 784–797 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Pisoni, D. B. & Levi, S. V. in The Oxford Handbook of Psycholinguistics (ed. Gaskell, M. G.) 3–18 (Oxford Univ. Press, 2007); https://doi.org/10.1093/oxfordhb/9780198568971.013.0001

  87. Vitevitch, M. S., Luce, P. A., Pisoni, D. B. & Auer, E. T. Phonotactics, neighborhood activation, and lexical access for spoken words. Brain Lang. 68, 306–311 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. von Economo, C. F. & Koskinas, G. N. Die Cytoarchitektonik der Hirnrinde des Erwachsenen Menschen (J. Springer, 1925).

  89. Galaburda, A. & Sanides, F. Cytoarchitectonic organization of the human auditory cortex. J. Comp. Neurol. 190, 597–610 (1980).

    Article  CAS  PubMed  Google Scholar 

  90. Morosan, P., Rademacher, J., Palomero-Gallagher, N. & Zilles, K. in The Auditory Cortex (eds Heil, P., Scheich, H., Budinger, E. & Konig, R.) 45–68 (Psychology Press, 2005).

  91. Hopf, A. Die Myeloarchitektonik des Isocortex Temporalis Beim Menschen (De Gruyter, 1951).

  92. Moerel, M., De Martino, F. & Formisano, E. An anatomical and functional topography of human auditory cortical areas. Front. Neurosci. 8, 225 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  93. Nourski, K. V. Auditory processing in the human cortex: an intracranial electrophysiology perspective. Laryngoscope Investig. Otolaryngol 2, 147–156 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Griffiths, T. D. & Warren, J. D. The planum temporale as a computational hub. Trends Neurosci. 25, 348–353 (2002).

    Article  CAS  PubMed  Google Scholar 

  95. Hillis, A. E., Rorden, C. & Fridriksson, J. Brain regions essential for word comprehension: drawing inferences from patients. Ann. Neurol. 81, 759–768 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Mesulam, M.-M. et al. Word comprehension in temporal cortex and Wernicke area. Neurology 92, e224–e233 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  97. Binder, J. R. Current controversies on Wernicke’s area and its role in language. Curr. Neurol. Neurosci. Rep. 17, 58 (2017).

    Article  PubMed  Google Scholar 

  98. Muller, L., Hamilton, L. S., Edwards, E., Bouchard, K. E. & Chang, E. F. Spatial resolution dependence on spectral frequency in human speech cortex electrocorticography. J. Neural Eng. 13, 56013 (2016).

    Article  Google Scholar 

  99. Khodagholy, D. et al. NeuroGrid: recording action potentials from the surface of the brain. Nat. Neurosci. 18, 310–315 (2015).

    Article  CAS  PubMed  Google Scholar 

  100. Blumstein, S. E., Baker, E. & Goodglass, H. Phonological factors in auditory comprehension in aphasia. Neuropsychologia 15, 19–30 (1977).

    Article  CAS  PubMed  Google Scholar 

  101. Norris, D., McQueen, J. M. & Cutler, A. Prediction, Bayesian inference and feedback in speech recognition. Lang. Cogn. Neurosci. 31, 4–18 (2016).

    Article  PubMed  Google Scholar 

  102. Magnuson, J. S., Mirman, D., Luthra, S., Strauss, T. & Harris, H. D. Interaction in spoken word recognition models: feedback helps. Front. Psychol. 9, 369 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  103. Norris, D. & McQueen, J. M. Shortlist B: a Bayesian model of continuous speech recognition. Psychol. Rev. 115, 357–395 (2008).

    Article  PubMed  Google Scholar 

  104. Hamilton, L. S., Oganian, Y., Hall, J. & Chang, E. F. Parallel and distributed encoding of speech across human auditory cortex. Cell 184, 4626–4639.e13 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Groppe, D. M. et al. iELVis: an open source MATLAB toolbox for localizing and visualizing human intracranial electrode data. J. Neurosci. Methods 281, 40–48 (2017).

    Article  CAS  PubMed  Google Scholar 

  106. Jenkinson, M. & Smith, S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 5, 143–156 (2001).

    Article  CAS  PubMed  Google Scholar 

  107. Jenkinson, M., Bannister, P., Brady, M. & Smith, S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage 17, 825–841 (2002).

    Article  PubMed  Google Scholar 

  108. Smith, S. M. Fast robust automated brain extraction. Hum. Brain Mapp. 17, 143–155 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  109. Papademetris, X. et al. BioImage Suite: an integrated medical image analysis suite: an update. Insight J. 2006, 209 (2006).

    PubMed  PubMed Central  Google Scholar 

  110. Sweet, R. A., Dorph‐Petersen, K. & Lewis, D. A. Mapping auditory core, lateral belt, and parabelt cortices in the human superior temporal gyrus. J. Comp. Neurol. 491, 270–289 (2005).

    Article  PubMed  Google Scholar 

  111. Ozker, M., Schepers, I. M., Magnotti, J. F., Yoshor, D. & Beauchamp, M. S. A double dissociation between anterior and posterior superior temporal gyrus for processing audiovisual speech demonstrated by electrocorticography. J. Cogn. Neurosci. 29, 1044–1060 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  112. Gorman, K., Howell, J. & Wagner, M. Prosodylab-Aligner: a tool for forced alignment of laboratory speech. Can. Acoust. 39, 192–193 (2011).

    Google Scholar 

  113. Crosse, M. J., di Liberto, G. M., Bednar, A. & Lalor, E. C. The Multivariate Temporal Response Function (mTRF) toolbox: a MATLAB toolbox for relating neural signals to continuous stimuli. Front. Hum. Neurosci. 10, 604 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  114. Ward, J. H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963).

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by National Institutes of Health grant no. R01DC018805 (N.M.) and National Institute on Deafness and Other Communication Disorders grant no. R01DC014279 (N.M.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

M.K. and N.M. designed the experiment. M.K., S.A., J.H., S.B., A.D.M. and N.M. recorded the neural data. M.K. and N.M. analysed the data and wrote the manuscript. All authors commented on the manuscript.

Corresponding author

Correspondence to Nima Mesgarani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Frederic Theunissen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Electrode locations.

Electrodes are distributed across fifteen subjects and are either depth or subdural grids Cand/or strips. Shape indicates electrode type, where circles represent depth electrodes, and triangles represent subdural contacts. Shape colour indicates which of the fifteen subjects an electrode belongs to.

Extended Data Fig. 2 Speech-responsiveness.

Circles represent electrodes in the immediate vicinity of the auditory cortex, based on their 3D coordinates in the ‘fsaverage’ space. Filled circles indicate speech-responsiveness, meaning the electrode site responds significantly differently to speech compared to silence (see ‘Electrode selection’ in Methods). Non-responsiveness could mean the electrode is not sound-responsive, is sound-responsive but not speech-responsive, or has insufficient signal-to-noise ratio (SNR). The non-responsive electrodes were excluded from all analyses.

Extended Data Fig. 3 Phonetic features.

Each phoneme is represented by 22 binary features based on its voicing, place, and manner of articulation features.

Extended Data Fig. 4 Selecting stimulus window length for prediction.

A window size of 510 ms was chosen to maximize linear model performance with the minimal number of parameters. We fit multiple models, each with a different number of time-lags (window size), from 60 ms to 760 ms. Each model was trained with the full list of predictors shown in Fig. 1c on all electrodes selected by the selection criteria described in Methods (n = 242), and only differed from the other models in the number of lags. Error bars indicate standard error (SE) over electrodes. To compare two different sizes, we perform a paired-sample one-tailed t-test on the cross-validated out-of-sample prediction r-values to determine whether the larger model improves upon the smaller one. The 510 ms model (dashed line) showed a significant improvement over all smaller models (60 ms – 410 ms, p < 0.001; 460 ms, p = 0.023). No larger model showed significant improvement over the 510 ms model (p > 0.5).

Extended Data Fig. 5 Correlations among features.

The linguistic features defined in this study are themselves correlated with each other. This plot shows the absolute value of the Pearson correlation coefficient for all pairs of 1-dimensional linguistic features (22-dimensional phonetic features excluded from figure; L1: lexical entropy, L2: lexical surprisal). The correlations are computed on the same 30-minute dataset used for all other analyses. All correlations are statistically significant (p < 0.001).

Extended Data Fig. 6 Explained variance of TRF diversity.

The bootstrapped (n = 1000) PCA analysis in Fig. 3 generates multiple eigenvectors at each bootstrap sample. We use the eigenvector that captures the most variance for computing the time courses (3a) and peak latencies (3b). This plot shows the mean and standard deviation of the explained variance for each of the top-10 principal components, computed using the same bootstrap procedure. In all cases, the first principal component captures roughly half of the total variance across all electrodes.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Keshishian, M., Akkol, S., Herrero, J. et al. Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex. Nat Hum Behav 7, 740–753 (2023). https://doi.org/10.1038/s41562-023-01520-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-023-01520-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing