Multiscale temporal integration organizes hierarchical computation in human auditory cortex

Norman-Haignere, Sam V.; Long, Laura K.; Devinsky, Orrin; Doyle, Werner; Irobunda, Ifeoma; Merricks, Edward M.; Feldstein, Neil A.; McKhann, Guy M.; Schevon, Catherine A.; Flinker, Adeen; Mesgarani, Nima

doi:10.1038/s41562-021-01261-y

Article
Published: 10 February 2022

Multiscale temporal integration organizes hierarchical computation in human auditory cortex

Nature Human Behaviour volume 6, pages 455–469 (2022)Cite this article

3881 Accesses
25 Citations
38 Altmetric
Metrics details

Subjects

Abstract

To derive meaning from sound, the brain must integrate information across many timescales. What computations underlie multiscale integration in human auditory cortex? Evidence suggests that auditory cortex analyses sound using both generic acoustic representations (for example, spectrotemporal modulation tuning) and category-specific computations, but the timescales over which these putatively distinct computations integrate remain unclear. To answer this question, we developed a general method to estimate sensory integration windows—the time window when stimuli alter the neural response—and applied our method to intracranial recordings from neurosurgical patients. We show that human auditory cortex integrates hierarchically across diverse timescales spanning from ~50 to 400 ms. Moreover, we find that neural populations with short and long integration windows exhibit distinct functional properties: short-integration electrodes (less than ~200 ms) show prominent spectrotemporal modulation selectivity, while long-integration electrodes (greater than ~200 ms) show prominent category selectivity. These findings reveal how multiscale integration organizes auditory computation in the human brain.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 3: Model-estimated integration windows.**

**Fig. 4: Anatomy of model-estimated integration windows.**

**Fig. 5: Functional selectivity in electrodes with differing integration windows.**

Control of working memory by phase–amplitude coupling of human hippocampal neurons

Article Open access 17 April 2024

Complex activity and short-term plasticity of human cerebral organoids reciprocally connected with axons

Article Open access 10 April 2024

Memorability shapes perceived time (and vice versa)

Article 22 April 2024

Data availability

Source data are also provided with this paper. The data supporting the findings of this study are available from the corresponding author upon request. Data are shared upon request due to the sensitive nature of human patient data. The TCI stimuli and the Source data underlying key statistics and figures (Figs. 4 and 5) are available at this repository: https://github.com/snormanhaignere/NHB-TCI-source-data.

Code availability

Code implementing the TCI analyses described in this paper is available at: https://github.com/snormanhaignere/TCI

References

Brodbeck, C., Hong, L. E. & Simon, J. Z. Rapid transformation from auditory to linguistic representations of continuous speech. Curr. Biol. 28, 3976–3983 (2018).
Article CAS PubMed PubMed Central Google Scholar
DeWitt, I. & Rauschecker, J. P. Phoneme and word recognition in the auditory ventral stream. Proc. Natl Acad. Sci. USA 109, E505–E514 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007).
Article CAS PubMed Google Scholar
Santoro, R. et al. Encoding of natural sounds at multiple spectral and temporal resolutions in the human auditory cortex. PLoS Comput. Biol. 10, e1003412 (2014).
Article PubMed PubMed Central Google Scholar
Hullett, P. W., Hamilton, L. S., Mesgarani, N., Schreiner, C. E. & Chang, E. F. Human superior temporal gyrus organization of spectrotemporal modulation tuning derived from speech stimuli. J. Neurosci. 36, 2014–2026 (2016).
Article CAS PubMed PubMed Central Google Scholar
Schönwiesner, M. & Zatorre, R. J. Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Proc. Natl Acad. Sci. USA 106, 14611–14616 (2009).
Article PubMed PubMed Central Google Scholar
Barton, B., Venezia, J. H., Saberi, K., Hickok, G. & Brewer, A. A. Orthogonal acoustic dimensions define auditory field maps in human cortex. Proc. Natl Acad. Sci. USA 109, 20738–20743 (2012).
Article CAS PubMed PubMed Central Google Scholar
Leaver, A. M. & Rauschecker, J. P. Cortical representation of natural complex sounds: effects of acoustic features and auditory object category. J. Neurosci. 30, 7604–7612 (2010).
Article CAS PubMed PubMed Central Google Scholar
Norman-Haignere, S. V., Kanwisher, N. G. & McDermott, J. H. Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron 88, 1281–1296 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V. & McDermott, J. H. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98, 630–644 (2018).
Overath, T., McDermott, J. H., Zarate, J. M. & Poeppel, D. The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts. Nat. Neurosci. 18, 903–911 (2015).
Article CAS PubMed PubMed Central Google Scholar
Davis, M. H. & Johnsrude, I. S. Hierarchical processing in spoken language comprehension. J. Neurosci. 23, 3423–3431 (2003).
Article CAS PubMed PubMed Central Google Scholar
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P. & Pike, B. Voice-selective areas in human auditory cortex. Nature 403, 309–312 (2000).
Article CAS PubMed Google Scholar
Zuk, N. J., Teoh, E. S. & Lalor, E. C. EEG-based classification of natural sounds reveals specialized responses to speech and music. NeuroImage 210, 116558 (2020).
Article PubMed Google Scholar
Di Liberto, G. M., O’Sullivan, J. A. & Lalor, E. C. Low-frequency cortical entrainment to speech reflects phoneme-level processing. Curr. Biol. 25, 2457–2465 (2015).
Article PubMed Google Scholar
Ding, N. et al. Temporal modulations in speech and music. Neurosci. Biobehav. Rev. 81, 181–187 (2017).
Elhilali, M. in Timbre: Acoustics, Perception, and Cognition (eds Siedenburg, K. et al.) 335–359 (Springer, 2019).
Patel, A. D. Music, Language, and the Brain (Oxford Univ. Press, 2007).
Norman-Haignere, S. V. & McDermott, J. H. Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex. PLoS Biol. 16, e2005127 (2018).
Article PubMed PubMed Central Google Scholar
Theunissen, F. & Miller, J. P. Temporal encoding in nervous systems: a rigorous definition. J. Comput. Neurosci. 2, 149–162 (1995).
Article CAS PubMed Google Scholar
Lerner, Y., Honey, C. J., Silbert, L. J. & Hasson, U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).
Article CAS PubMed PubMed Central Google Scholar
Chen, C., Read, H. L. & Escabí, M. A. Precise feature based time scales and frequency decorrelation lead to a sparse auditory code. J. Neurosci. 32, 8454–8468 (2012).
Article CAS PubMed PubMed Central Google Scholar
Meyer, A. F., Williamson, R. S., Linden, J. F. & Sahani, M. Models of neuronal stimulus-response functions: elaboration, estimation, and evaluation. Front. Syst. Neurosci. 10, 109 (2017).
Article PubMed PubMed Central Google Scholar
Khatami, F. & Escabí, M. A. Spiking network optimized for word recognition in noise predicts auditory system hierarchy. PLoS Comput. Biol. 16, e1007558 (2020).
Article CAS PubMed PubMed Central Google Scholar
Harper, N. S. et al. Network receptive field modeling reveals extensive integration and multi-feature selectivity in auditory cortical neurons. PLoS Comput. Biol. 12, e1005113 (2016).
Article PubMed PubMed Central Google Scholar
Keshishian, M. et al. Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models. eLife 9, e53445 (2020).
Article PubMed PubMed Central Google Scholar
Albouy, P., Benjamin, L., Morillon, B. & Zatorre, R. J. Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367, 1043–1047 (2020).
Article CAS PubMed Google Scholar
Flinker, A., Doyle, W. K., Mehta, A. D., Devinsky, O. & Poeppel, D. Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries. Nat. Hum. Behav. 3, 393–405 (2019).
Teng, X. & Poeppel, D. Theta and Gamma bands encode acoustic dynamics over wide-ranging timescales. Cereb. Cortex 30, 2600–2614 (2020).
Article PubMed Google Scholar
Obleser, J., Eisner, F. & Kotz, S. A. Bilateral speech comprehension reflects differential sensitivity to spectral and temporal features. J. Neurosci. 28, 8116–8123 (2008).
Article CAS PubMed PubMed Central Google Scholar
Baumann, S. et al. The topography of frequency and time representation in primate auditory cortices. eLife 4, e03256 (2015).
Article PubMed Central Google Scholar
Rogalsky, C., Rong, F., Saberi, K. & Hickok, G. Functional anatomy of language and music perception: temporal and structural factors investigated using functional magnetic resonance imaging. J. Neurosci. 31, 3843–3852 (2011).
Article CAS PubMed PubMed Central Google Scholar
Farbood, M. M., Heeger, D. J., Marcus, G., Hasson, U. & Lerner, Y. The neural processing of hierarchical structure in music and speech at different timescales. Front. Neurosci. 9, 157 (2015).
Article PubMed PubMed Central Google Scholar
Angeloni, C. & Geffen, M. N. Contextual modulation of sound processing in the auditory cortex. Curr. Opin. Neurobiol. 49, 8–15 (2018).
Article CAS PubMed Google Scholar
Griffiths, T. D. et al. Direct recordings of pitch responses from human auditory cortex. Curr. Biol. 20, 1128–1132 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ray, S. & Maunsell, J. H. R. Different origins of gamma rhythm and high-gamma activity in macaque visual cortex. PLoS Biol. 9, e1000610 (2011).
Article CAS PubMed PubMed Central Google Scholar
Manning, J. R., Jacobs, J., Fried, I. & Kahana, M. J. Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. J. Neurosci. 29, 13613–13620 (2009).
Article CAS PubMed PubMed Central Google Scholar
Slaney, M. Auditory toolbox. Interval Res. Corporation, Tech. Rep. 10, 1998 (1998).
Google Scholar
Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906 (2005).
Article PubMed Google Scholar
Singh, N. C. & Theunissen, F. E. Modulation spectra of natural sounds and ethological theories of auditory processing. J. Acoust. Soc. Am. 114, 3394–3411 (2003).
Article PubMed Google Scholar
Di Liberto, G. M., Wong, D., Melnik, G. A. & de Cheveigné, A. Low-frequency cortical responses to natural speech reflect probabilistic phonotactics. Neuroimage 196, 237–247 (2019).
Article PubMed Google Scholar
Leonard, M. K., Bouchard, K. E., Tang, C. & Chang, E. F. Dynamic encoding of speech sequence probability in human temporal cortex. J. Neurosci. 35, 7203–7214 (2015).
Article CAS PubMed PubMed Central Google Scholar
Schoppe, O., Harper, N. S., Willmore, B. D., King, A. J. & Schnupp, J. W. Measuring the performance of neural models. Front. Comput. Neurosci. 10, 10 (2016).
Article PubMed PubMed Central Google Scholar
Mizrahi, A., Shalev, A. & Nelken, I. Single neuron and population coding of natural sounds in auditory cortex. Curr. Opin. Neurobiol. 24, 103–110 (2014).
Article CAS PubMed Google Scholar
Chien, H.-Y. S. & Honey, C. J. Constructing and forgetting temporal context in the human cerebral cortex. Neuron 106, 675–686 (2020).
Panzeri, S., Brunel, N., Logothetis, N. K. & Kayser, C. Sensory neural codes using multiplexed temporal scales. Trends Neurosci. 33, 111–120 (2010).
Article CAS PubMed Google Scholar
Joris, P. X., Schreiner, C. E. & Rees, A. Neural processing of amplitude-modulated sounds. Physiol. Rev. 84, 541–577 (2004).
Article CAS PubMed Google Scholar
Wang, X., Lu, T., Bendor, D. & Bartlett, E. Neural coding of temporal information in auditory thalamus and cortex. Neuroscience 154, 294–303 (2008).
Article CAS PubMed Google Scholar
Gao, X. & Wehr, M. A coding transformation for temporally structured sounds within auditory cortical neurons. Neuron 86, 292–303 (2015).
Article CAS PubMed PubMed Central Google Scholar
McDermott, J. H. & Simoncelli, E. P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cohen, M. R. & Kohn, A. Measuring and interpreting neuronal correlations. Nat. Neurosci. 14, 811–819 (2011).
Murray, J. D. et al. A hierarchy of intrinsic timescales across primate cortex. Nat. Neurosci. 17, 1661–1663 (2014).
Chaudhuri, R., Knoblauch, K., Gariel, M.-A., Kennedy, H. & Wang, X.-J. A large-scale circuit mechanism for hierarchical dynamical processing in the primate cortex. Neuron 88, 419–431 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rauschecker, J. P. & Scott, S. K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat. Neurosci. 12, 718–724 (2009).
Article CAS PubMed PubMed Central Google Scholar
Sharpee, T. O., Atencio, C. A. & Schreiner, C. E. Hierarchical representations in the auditory cortex. Curr. Opin. Neurobiol. 21, 761–767 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zatorre, R. J., Belin, P. & Penhune, V. B. Structure and function of auditory cortex: music and speech. Trends Cogn. Sci. 6, 37–46 (2002).
Article PubMed Google Scholar
Poeppel, D. The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun. 41, 245–255 (2003).
Article Google Scholar
Hamilton, L. S., Oganian, Y., Hall, J. & Chang, E. F. Parallel and distributed encoding of speech across human auditory cortex. Cell 184, 4626–4639 (2021).
Article CAS PubMed Google Scholar
Nourski, K. V. et al. Functional organization of human auditory cortex: investigation of response latencies through direct recordings. NeuroImage 101, 598–609 (2014).
Bartlett, E. L. The organization and physiology of the auditory thalamus and its role in processing acoustic features important for speech perception. Brain Lang. 126, 29–48 (2013).
Article PubMed PubMed Central Google Scholar
Gattass, R., Gross, C. G. & Sandell, J. H. Visual topography of V2 in the macaque. J. Comp. Neurol. 201, 519–539 (1981).
Article CAS PubMed Google Scholar
Dumoulin, S. O. & Wandell, B. A. Population receptive field estimates in human visual cortex. Neuroimage 39, 647–660 (2008).
Article PubMed Google Scholar
Ding, N., Melloni, L., Zhang, H., Tian, X. & Poeppel, D. Cortical tracking of hierarchical linguistic structures in connected speech. Nat. Neurosci. 19, 158–164 (2016).
Article CAS PubMed Google Scholar
Suied, C., Agus, T. R., Thorpe, S. J., Mesgarani, N. & Pressnitzer, D. Auditory gist: recognition of very short sounds from timbre cues. J. Acoust. Soc. Am. 135, 1380–1391 (2014).
Article PubMed Google Scholar
Donhauser, P. W. & Baillet, S. Two distinct neural timescales for predictive speech processing. Neuron 105, 385–393 (2020).
Article CAS PubMed Google Scholar
Ulanovsky, N., Las, L., Farkas, D. & Nelken, I. Multiple time scales of adaptation in auditory cortex neurons. J. Neurosci. 24, 10440–10453 (2004).
Article CAS PubMed PubMed Central Google Scholar
Lu, K. et al. Implicit memory for complex sounds in higher auditory cortex of the ferret. J. Neurosci. 38, 9955–9966 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chew, S. J., Mello, C., Nottebohm, F., Jarvis, E. & Vicario, D. S. Decrements in auditory responses to a repeated conspecific song are long-lasting and require two periods of protein synthesis in the songbird forebrain. Proc. Natl Acad. Sci. USA 92, 3406–3410 (1995).
Article CAS PubMed PubMed Central Google Scholar
Bianco, R. et al. Long-term implicit memory for sequential auditory patterns in humans. eLife 9, e56073 (2020).
Article PubMed PubMed Central Google Scholar
Miller, K. J., Honey, C. J., Hermes, D., Rao, R. P. & Ojemann, J. G. Broadband changes in the cortical surface potential track activation of functionally diverse neuronal populations. Neuroimage 85, 711–720 (2014).
Article PubMed Google Scholar
Leszczyński, M. et al. Dissociation of broadband high-frequency activity and neuronal firing in the neocortex. Sci. Adv. 6, eabb0977 (2020).
Article PubMed PubMed Central Google Scholar
Günel, B., Thiel, C. M. & Hildebrandt, K. J. Effects of exogenous auditory attention on temporal and spectral resolution. Front. Psychol. 9, 1984 (2018).
Article PubMed PubMed Central Google Scholar
Norman-Haignere, S. V. et al. Pitch-responsive cortical regions in congenital amusia. J. Neurosci. 36, 2986–2994 (2016).
Article CAS PubMed PubMed Central Google Scholar
Norman-Haignere, S. et al. Intracranial recordings from human auditory cortex reveal a neural population selective for musical song. Preprint at bioRxiv https://doi.org/10.1101/696161 (2020).
Boebinger, D., Norman-Haignere, S. V., McDermott, J. H. & Kanwisher, N. Music-selective neural populations arise without musical training. J. Neurophysiol. 125, 2237–2263 (2021).
Article PubMed Google Scholar
Morosan, P. et al. Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system. Neuroimage 13, 684–701 (2001).
Article CAS PubMed Google Scholar
Baumann, S., Petkov, C. I. & Griffiths, T. D. A unified framework for the organization of the primate auditory cortex. Front. Syst. Neurosci. 7, 11 (2013).
Article PubMed PubMed Central Google Scholar
Barr, D. J., Levy, R., Scheepers, C. & Tily, H. J. Random effects structure for confirmatory hypothesis testing: keep it maximal. J. Mem. Lang. 68, 255–278 (2013).
Article Google Scholar
Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. lmerTest package: tests in linear mixed effects models. J. Stat. Softw. 82, 1–26 (2017).
Article Google Scholar
Gelman, A. & Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge Univ. Press, 2006).
Schielzeth, H. et al. Robustness of linear mixed-effects models to violations of distributional assumptions. Methods Ecol. Evol. 11, 1141–1152 (2020).
Article Google Scholar
de Cheveigné, A. & Parra, L. C. Joint decorrelation, a versatile tool for multichannel data analysis. Neuroimage 98, 487–505 (2014).
Article PubMed Google Scholar
Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).
de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L. & Theunissen, F. E. The hierarchical cortical organization of human speech processing. J. Neurosci. 37, 6539–6557 (2017).
Marquardt, D. W. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11, 431–441 (1963).
Article Google Scholar
Fisher, W. M. tsylb: NIST syllabification software, version 2 revised (1997).

Download references

Acknowledgements

We thank D. Maksumov, N. Agrawal, S. Montenegro, L. Yu, M. Leszczynski and I. Tal for help with data collection, S. Montenegro and H. Wang for help in localizing electrodes and A. Kell, S. David, J. McDermott, B. Conway, N. Kanwisher, N. Kriegeskorte and M. Leszczynski for comments on an earlier draft of this manuscript. This study was supported by the National Institutes of Health (NIDCD-K99-DC018051 to S.V.N.-H., NIDCD-R01-DC014279 to N.M., S10 OD018211 to N.M., NINDS-R01-NS084142 to C.A.S. and NIDCD-R01-DC018805 to N.M./A.F.) and the Howard Hughes Medical Institute (LSRF postdoctoral award to S.V.N.-H.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Zuckerman Mind, Brain, Behavior Institute, Columbia University, New York, NY, USA
Sam V. Norman-Haignere, Laura K. Long & Nima Mesgarani
Life Sciences Research Foundation, Cockeysville, MD, USA
Sam V. Norman-Haignere
Howard Hughes Medical Institute, Chevy Chase, MD, USA
Sam V. Norman-Haignere
Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY, USA
Sam V. Norman-Haignere
Department of Neuroscience, University of Rochester Medical Center, Rochester, NY, USA
Sam V. Norman-Haignere
Doctoral Program in Neurobiology and Behavior, Columbia University, New York, NY, USA
Laura K. Long & Nima Mesgarani
Department of Neurology, NYU Langone Medical Center, New York, NY, USA
Orrin Devinsky & Adeen Flinker
Comprehensive Epilepsy Center, NYU Langone Medical Center, New York, NY, USA
Orrin Devinsky, Werner Doyle & Adeen Flinker
Department of Neurosurgery, NYU Langone Medical Center, New York, NY, USA
Werner Doyle
Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA
Ifeoma Irobunda, Edward M. Merricks & Catherine A. Schevon
Department of Neurological Surgery, Columbia University Irving Medical Center, New York, NY, USA
Neil A. Feldstein & Guy M. McKhann
Department of Biomedical Engineering, NYU Tandon School of Engineering, New York, NY, USA
Adeen Flinker
Department of Electrical Engineering, Columbia University, New York, NY, USA
Nima Mesgarani

Authors

Sam V. Norman-Haignere
View author publications
You can also search for this author in PubMed Google Scholar
Laura K. Long
View author publications
You can also search for this author in PubMed Google Scholar
Orrin Devinsky
View author publications
You can also search for this author in PubMed Google Scholar
Werner Doyle
View author publications
You can also search for this author in PubMed Google Scholar
Ifeoma Irobunda
View author publications
You can also search for this author in PubMed Google Scholar
Edward M. Merricks
View author publications
You can also search for this author in PubMed Google Scholar
Neil A. Feldstein
View author publications
You can also search for this author in PubMed Google Scholar
Guy M. McKhann
View author publications
You can also search for this author in PubMed Google Scholar
Catherine A. Schevon
View author publications
You can also search for this author in PubMed Google Scholar
Adeen Flinker
View author publications
You can also search for this author in PubMed Google Scholar
Nima Mesgarani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.V.N.-H., L.K.L., I.I. and E.M.M. collected data for the experiments described in this manuscript. O.D., W.D., N.A.F., G.M.K. and C.A.S. collectively planned, coordinated and executed the neurosurgical electrode implantation needed for intracranial monitoring. S.V.N.-H. performed the analyses with help from LL and designed the TCI method and model. S.V.N.-H., A.F. and N.M. designed the experiment. S.V.N.-H., A.F. and N.M. wrote the paper.

Corresponding authors

Correspondence to Sam V. Norman-Haignere or Nima Mesgarani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Jérémy Giroud, Jonas Obleser, Benjamin Morillon and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Histogram of phoneme, syllable, and word durations in TIMIT.

Durations of phonemes, multi-phoneme syllables, and multi-syllable words in the commonly used TIMIT database. Phonemes and words are labeled in the database. Syllables were computed from the phoneme labels using the software tsylb2⁸⁷. The median duration for each structure is 64, 197, and 479 milliseconds, respectively.

Extended Data Fig. 2 Cross-context correlation for 20 representative electrodes.

Electrodes were selected to illustrate the diversity of integration windows. Specifically, we partitioned all sound-responsive electrodes into 5 groups based on the width of their integration window, estimated using a model (Fig. 3 illustrates the model). For each group, we plot the four electrodes with the highest SNR (as measured by the test-retest correlation across the sound set). Electrodes have been sorted by their integration width, which is indicated to the right of each plot, along with the location, hemisphere and subject number for each electrode. Each plot shows the cross-context correlation and noise ceiling for a single electrode and segment duration (indicated above each column). There were more segments for the shorter durations, and as a consequence, the cross-context correlation and noise ceiling were more stable/reliable for shorter segments (the number of segments was inversely proportional to the duration). This property is useful because at the short segment durations, there are a smaller number of relevant time lags, and it is useful if those lags are more reliable. The model used to estimate integration windows pooled across all lags and segment durations, taking into account the reliability of each datapoint.

Extended Data Fig. 3 Simulation results.

a, Integration windows estimated from four different model responses (from top to bottom): (1) a model that integrated waveform magnitudes within a known window (2) a model that integrated energy within a cochlear frequency band (3) a model that integrated spectrotemporal energy in a cochleagram representation of sound (4) a simple, deep neural network. All models had a ground truth, Gamma-distributed integration window. We independently varied the width and centre of the integration window (excluding non-causal combinations) and tested if we could infer the ground truth values. Results are shown for several different SNRs, as measured by the test-retest correlation of the response across repetitions, the same metric used to select electrodes (we selected electrodes with a test-retest correlation greater than 0.1). Black dots correspond to a single model window/simulation. Red dots show the median estimate across all windows/simulations. Some models included more variants (for example different spectrotemporal filters), which is why some plots have a higher dot density. There is a small upward bias for very narrow integration widths (31 ms), probably due to the effects of the filter used to measure broadband gamma, which has an integration width of ~19 milliseconds. The integration widths of our electrodes (~50 to 400 ms) were mostly above the point at which this bias would have a substantial effect, and the bias works against our observed results since it compresses the possible range of integration widths. b, Integration windows estimated without explicitly modeling and accounting for boundary effects. Results are shown for the spectrotemporal model, which produces strong responses at the boundary between two segments due to prominent spectrotemporal changes. Note there is a nontrivial upward bias, particularly for integration widths, when not accounting for boundary effects (see Methods for a more detailed discussion). c, Integration windows estimated without accounting for an upward bias in the squared error loss. The bias grows as the SNR decreases (see Methods for an explanation). Results are shown for the waveform amplitude model, but the bias is present for all models since it is caused by the loss. Our bias-corrected loss largely corrected the problem, as can be observed in panel a.

Extended Data Fig. 4 Integration windows for different electrode types and subjects.

a, This panel plots integration widths (left) and centres (right) for individual electrodes as a function of distance to primary auditory cortex, defined as posteromedial Heschl’s gyrus. The electrodes have been labeled by their type (grid, depth, strip). The grid/strip electrodes were located further from primary auditory cortex on average, but given their location did not show any obvious difference in integration properties. The effect of distance was significant for the depth electrodes alone (the most numerous type of electrode) when excluding grids and strips (width: F_1,14.53 = 24.51, p < 0.001, β_distance = 0.065 octaves/mm, CI = [0.039,0.090]; centre: F_1,12.83 = 27.76, p < 0.001, β_distance = 0.052 octaves/mm, CI = [0.032,0.071], N = 114 electrodes). To be conservative, electrode type was included as a covariate in the linear mixed effects model used to assess significance as a whole. b, Same as panel a but indicating subject membership instead of electrode type. Each symbol corresponds to a unique subject. The effect of distance on integration windows is broadly distributed across the 18 subjects.

Extended Data Fig. 5 Robustness analyses.

a, Sound segments were excerpted from 10 sounds. This panel shows integration windows estimated using segments drawn from two non-overlapping splits of 5 sounds each (listed on the left). Since many non-primary regions only respond strongly to speech or music^8,9,11, we included speech and music in both splits. Format is analogous to Fig. 4 but only showing integration widths (integration centres were also similar across analysis variants). The effect of distance was significant for both splits (split1: F_1,12.660 = 40.20, p < 0.001, β_distance = 0.069 octaves/mm, CI = [0.047,0.090], N = 136 electrodes; split 2: F_1,21.66 = 30.11, p < 0.001, β_distance = 0.066 octaves/mm, CI = [0.043,0.090], N = 135 electrodes). b, Shorter segments were created by subdividing longer segments, which made it possible to consider two types of context (see schematic): (1) random context, in which each segment is surrounded by random other segments (2) natural context, where a segment is a subset of a longer segment and thus surrounded by its natural context. When comparing responses across contexts, one of the two contexts must be random so that the contexts differ, but the other context can be random or natural. Our main analyses pooled across both types of comparison. Here, we show integration widths estimated by comparing either purely random contexts (top panel) or comparing random and natural contexts (bottom panel). The effect of distance was significant for both types of context comparisons (random-random: F_1,28.056 = 30.01, p < 0.001, β_distance = 0.064 octaves/mm, CI = [0.041,0.087], N = 121 electrodes; random-natural: F_1,18.816 = 27.087, p < 0.001, β_distance = 0.062 octaves/mm, CI = [0.039,0.086], N = 154 electrodes). c, We modeled integration windows using window shapes that varied from more exponential to more Gaussian (the parameter γ in equations 2 and 3 controls the shape of the window, see Methods). For our main analysis, we selected the shape that yielded the best prediction for each electrode. This panel shows integration widths estimated using two different fixed shapes. The effect of distance was significant for both shapes (γ = 1: F_1,21.712 = 24.85, p < 0.001, β_distance = 0.067 octaves/mm, CI = [0.040,0.093], N = 154 electrodes; γ = 4: F_1,20.973 = 19.38, p < 0.001, β_distance = 0.055 octaves/mm, CI = [0.031,0.080], N = 154 electrodes). d, Similar results were obtained using two different frequency ranges to measure gamma power (70–100 s Hz: F_1,21.05 = 19.38, p < 0.001, β_distance = 0.058 octaves/mm, CI = [0.032,0.083], N = 133 electrodes; 100–140 Hz: F_1,20.56 = 12.57, p < 0.01, β_distance = 0.051 octaves/mm, CI = [0.023,0.080], N = 131 electrodes).

Extended Data Fig. 6 Relationship between integration widths and centres without any causality constraint.

This figure plots integration centres vs. widths for windows that were not explicitly constrained to be causal. Results were similar to those with an explicit causality constraint (Fig. 4c). Same format as Fig. 4c.

Extended Data Fig. 7 Components most selective for sound categories at different integration widths.

Electrodes were subdivided into three equally sized groups based on the width of their integration window. The time-averaged response of each electrode was then projected onto the top 2 components that showed the greatest category selectivity, measured using linear discriminant analysis (each circle corresponds to a unique sound). Same format as Fig. 5b, which plots responses projected onto the top 2 principal components. Half of the sounds were used to compute the components, and the other half were used to measure their response to avoid statistical circularity. As a consequence, there are half as many sounds as in Fig. 5b.

Extended Data Fig. 8 Results for integration-matched responses.

a, For our functional selectivity analyses, we subdivided the electrodes into three equally sized groups, based on the width of their integration window. To test if our results were an inevitable consequence of differences in temporal integration, we matched the integration windows across the electrodes in each group. Matching was performed by integrating the responses from the electrodes in the short and intermediate groups within an appropriately chosen window, such that the resulting integration window matched those for the longest group (see Integration matching in Methods). This figure plots a histogram of the effective integration windows after matching. b-d, These panels show the results of our applying our functional selectivity analyses to integration-matched responses. Format is the same as Fig. 5b-d.

Supplementary information

Supplementary Fig. 1.

Reporting Summary

Peer review information.

Source data

Source Data Fig. 4

Integration widths and centres for all electrodes along with their distance to primary auditory cortex and relevant metadata (that is, hemisphere, subject ID and electrode type).

Source Data Fig. 5

Principal component loadings plotted in Fig. 5b. Prediction accuracies for acoustic features and category labels for all electrodes along with relevant metadata (that is, hemisphere, subject ID, electrode type and reliability ceiling).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Norman-Haignere, S.V., Long, L.K., Devinsky, O. et al. Multiscale temporal integration organizes hierarchical computation in human auditory cortex. Nat Hum Behav 6, 455–469 (2022). https://doi.org/10.1038/s41562-021-01261-y

Download citation

Received: 30 October 2020
Accepted: 18 November 2021
Published: 10 February 2022
Issue Date: March 2022
DOI: https://doi.org/10.1038/s41562-021-01261-y

This article is cited by

Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes
- Jian Carlo Nocon
- Howard J. Gritton
- Kamal Sen
Communications Biology (2023)
Is song processing distinct and special in the auditory cortex?
- Ilana Harris
- Efe C. Niven
- Sophie K. Scott
Nature Reviews Neuroscience (2023)
The importance of temporal-fine structure to perceive time-compressed speech with and without the restoration of the syllabic rhythm
- Robin Gransier
- Sara Peeters
- Jan Wouters
Scientific Reports (2023)
Hearing as adaptive cascaded envelope interpolation
- Etienne Thoret
- Sølvi Ystad
- Richard Kronland-Martinet
Communications Biology (2023)
What auditory cortex is waiting for
- Lea-Maria Schmitt
- Jonas Obleser
Nature Human Behaviour (2022)