Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Multiscale temporal integration organizes hierarchical computation in human auditory cortex


To derive meaning from sound, the brain must integrate information across many timescales. What computations underlie multiscale integration in human auditory cortex? Evidence suggests that auditory cortex analyses sound using both generic acoustic representations (for example, spectrotemporal modulation tuning) and category-specific computations, but the timescales over which these putatively distinct computations integrate remain unclear. To answer this question, we developed a general method to estimate sensory integration windows—the time window when stimuli alter the neural response—and applied our method to intracranial recordings from neurosurgical patients. We show that human auditory cortex integrates hierarchically across diverse timescales spanning from ~50 to 400 ms. Moreover, we find that neural populations with short and long integration windows exhibit distinct functional properties: short-integration electrodes (less than ~200 ms) show prominent spectrotemporal modulation selectivity, while long-integration electrodes (greater than ~200 ms) show prominent category selectivity. These findings reveal how multiscale integration organizes auditory computation in the human brain.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Fig. 1: TCI paradigm.
Fig. 2: Cross-context correlation.
Fig. 3: Model-estimated integration windows.
Fig. 4: Anatomy of model-estimated integration windows.
Fig. 5: Functional selectivity in electrodes with differing integration windows.

Data availability

Source data are also provided with this paper. The data supporting the findings of this study are available from the corresponding author upon request. Data are shared upon request due to the sensitive nature of human patient data. The TCI stimuli and the Source data underlying key statistics and figures (Figs. 4 and 5) are available at this repository:

Code availability

Code implementing the TCI analyses described in this paper is available at:


  1. Brodbeck, C., Hong, L. E. & Simon, J. Z. Rapid transformation from auditory to linguistic representations of continuous speech. Curr. Biol. 28, 3976–3983 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. DeWitt, I. & Rauschecker, J. P. Phoneme and word recognition in the auditory ventral stream. Proc. Natl Acad. Sci. USA 109, E505–E514 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007).

    Article  CAS  PubMed  Google Scholar 

  4. Santoro, R. et al. Encoding of natural sounds at multiple spectral and temporal resolutions in the human auditory cortex. PLoS Comput. Biol. 10, e1003412 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Hullett, P. W., Hamilton, L. S., Mesgarani, N., Schreiner, C. E. & Chang, E. F. Human superior temporal gyrus organization of spectrotemporal modulation tuning derived from speech stimuli. J. Neurosci. 36, 2014–2026 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Schönwiesner, M. & Zatorre, R. J. Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Proc. Natl Acad. Sci. USA 106, 14611–14616 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Barton, B., Venezia, J. H., Saberi, K., Hickok, G. & Brewer, A. A. Orthogonal acoustic dimensions define auditory field maps in human cortex. Proc. Natl Acad. Sci. USA 109, 20738–20743 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Leaver, A. M. & Rauschecker, J. P. Cortical representation of natural complex sounds: effects of acoustic features and auditory object category. J. Neurosci. 30, 7604–7612 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Norman-Haignere, S. V., Kanwisher, N. G. & McDermott, J. H. Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron 88, 1281–1296 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V. & McDermott, J. H. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98, 630–644 (2018).

  11. Overath, T., McDermott, J. H., Zarate, J. M. & Poeppel, D. The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts. Nat. Neurosci. 18, 903–911 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Davis, M. H. & Johnsrude, I. S. Hierarchical processing in spoken language comprehension. J. Neurosci. 23, 3423–3431 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P. & Pike, B. Voice-selective areas in human auditory cortex. Nature 403, 309–312 (2000).

    Article  CAS  PubMed  Google Scholar 

  14. Zuk, N. J., Teoh, E. S. & Lalor, E. C. EEG-based classification of natural sounds reveals specialized responses to speech and music. NeuroImage 210, 116558 (2020).

    Article  PubMed  Google Scholar 

  15. Di Liberto, G. M., O’Sullivan, J. A. & Lalor, E. C. Low-frequency cortical entrainment to speech reflects phoneme-level processing. Curr. Biol. 25, 2457–2465 (2015).

    Article  PubMed  Google Scholar 

  16. Ding, N. et al. Temporal modulations in speech and music. Neurosci. Biobehav. Rev. 81, 181–187 (2017).

  17. Elhilali, M. in Timbre: Acoustics, Perception, and Cognition (eds Siedenburg, K. et al.) 335–359 (Springer, 2019).

  18. Patel, A. D. Music, Language, and the Brain (Oxford Univ. Press, 2007).

  19. Norman-Haignere, S. V. & McDermott, J. H. Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex. PLoS Biol. 16, e2005127 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Theunissen, F. & Miller, J. P. Temporal encoding in nervous systems: a rigorous definition. J. Comput. Neurosci. 2, 149–162 (1995).

    Article  CAS  PubMed  Google Scholar 

  21. Lerner, Y., Honey, C. J., Silbert, L. J. & Hasson, U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Chen, C., Read, H. L. & Escabí, M. A. Precise feature based time scales and frequency decorrelation lead to a sparse auditory code. J. Neurosci. 32, 8454–8468 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Meyer, A. F., Williamson, R. S., Linden, J. F. & Sahani, M. Models of neuronal stimulus-response functions: elaboration, estimation, and evaluation. Front. Syst. Neurosci. 10, 109 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Khatami, F. & Escabí, M. A. Spiking network optimized for word recognition in noise predicts auditory system hierarchy. PLoS Comput. Biol. 16, e1007558 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Harper, N. S. et al. Network receptive field modeling reveals extensive integration and multi-feature selectivity in auditory cortical neurons. PLoS Comput. Biol. 12, e1005113 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Keshishian, M. et al. Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models. eLife 9, e53445 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Albouy, P., Benjamin, L., Morillon, B. & Zatorre, R. J. Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367, 1043–1047 (2020).

    Article  CAS  PubMed  Google Scholar 

  28. Flinker, A., Doyle, W. K., Mehta, A. D., Devinsky, O. & Poeppel, D. Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries. Nat. Hum. Behav. 3, 393–405 (2019).

  29. Teng, X. & Poeppel, D. Theta and Gamma bands encode acoustic dynamics over wide-ranging timescales. Cereb. Cortex 30, 2600–2614 (2020).

    Article  PubMed  Google Scholar 

  30. Obleser, J., Eisner, F. & Kotz, S. A. Bilateral speech comprehension reflects differential sensitivity to spectral and temporal features. J. Neurosci. 28, 8116–8123 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Baumann, S. et al. The topography of frequency and time representation in primate auditory cortices. eLife 4, e03256 (2015).

    Article  PubMed Central  Google Scholar 

  32. Rogalsky, C., Rong, F., Saberi, K. & Hickok, G. Functional anatomy of language and music perception: temporal and structural factors investigated using functional magnetic resonance imaging. J. Neurosci. 31, 3843–3852 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Farbood, M. M., Heeger, D. J., Marcus, G., Hasson, U. & Lerner, Y. The neural processing of hierarchical structure in music and speech at different timescales. Front. Neurosci. 9, 157 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Angeloni, C. & Geffen, M. N. Contextual modulation of sound processing in the auditory cortex. Curr. Opin. Neurobiol. 49, 8–15 (2018).

    Article  CAS  PubMed  Google Scholar 

  35. Griffiths, T. D. et al. Direct recordings of pitch responses from human auditory cortex. Curr. Biol. 20, 1128–1132 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Ray, S. & Maunsell, J. H. R. Different origins of gamma rhythm and high-gamma activity in macaque visual cortex. PLoS Biol. 9, e1000610 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Manning, J. R., Jacobs, J., Fried, I. & Kahana, M. J. Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. J. Neurosci. 29, 13613–13620 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Slaney, M. Auditory toolbox. Interval Res. Corporation, Tech. Rep. 10, 1998 (1998).

    Google Scholar 

  40. Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906 (2005).

    Article  PubMed  Google Scholar 

  41. Singh, N. C. & Theunissen, F. E. Modulation spectra of natural sounds and ethological theories of auditory processing. J. Acoust. Soc. Am. 114, 3394–3411 (2003).

    Article  PubMed  Google Scholar 

  42. Di Liberto, G. M., Wong, D., Melnik, G. A. & de Cheveigné, A. Low-frequency cortical responses to natural speech reflect probabilistic phonotactics. Neuroimage 196, 237–247 (2019).

    Article  PubMed  Google Scholar 

  43. Leonard, M. K., Bouchard, K. E., Tang, C. & Chang, E. F. Dynamic encoding of speech sequence probability in human temporal cortex. J. Neurosci. 35, 7203–7214 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Schoppe, O., Harper, N. S., Willmore, B. D., King, A. J. & Schnupp, J. W. Measuring the performance of neural models. Front. Comput. Neurosci. 10, 10 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Mizrahi, A., Shalev, A. & Nelken, I. Single neuron and population coding of natural sounds in auditory cortex. Curr. Opin. Neurobiol. 24, 103–110 (2014).

    Article  CAS  PubMed  Google Scholar 

  46. Chien, H.-Y. S. & Honey, C. J. Constructing and forgetting temporal context in the human cerebral cortex. Neuron 106, 675–686 (2020).

  47. Panzeri, S., Brunel, N., Logothetis, N. K. & Kayser, C. Sensory neural codes using multiplexed temporal scales. Trends Neurosci. 33, 111–120 (2010).

    Article  CAS  PubMed  Google Scholar 

  48. Joris, P. X., Schreiner, C. E. & Rees, A. Neural processing of amplitude-modulated sounds. Physiol. Rev. 84, 541–577 (2004).

    Article  CAS  PubMed  Google Scholar 

  49. Wang, X., Lu, T., Bendor, D. & Bartlett, E. Neural coding of temporal information in auditory thalamus and cortex. Neuroscience 154, 294–303 (2008).

    Article  CAS  PubMed  Google Scholar 

  50. Gao, X. & Wehr, M. A coding transformation for temporally structured sounds within auditory cortical neurons. Neuron 86, 292–303 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. McDermott, J. H. & Simoncelli, E. P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Cohen, M. R. & Kohn, A. Measuring and interpreting neuronal correlations. Nat. Neurosci. 14, 811–819 (2011).

  53. Murray, J. D. et al. A hierarchy of intrinsic timescales across primate cortex. Nat. Neurosci. 17, 1661–1663 (2014).

  54. Chaudhuri, R., Knoblauch, K., Gariel, M.-A., Kennedy, H. & Wang, X.-J. A large-scale circuit mechanism for hierarchical dynamical processing in the primate cortex. Neuron 88, 419–431 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Rauschecker, J. P. & Scott, S. K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat. Neurosci. 12, 718–724 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Sharpee, T. O., Atencio, C. A. & Schreiner, C. E. Hierarchical representations in the auditory cortex. Curr. Opin. Neurobiol. 21, 761–767 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Zatorre, R. J., Belin, P. & Penhune, V. B. Structure and function of auditory cortex: music and speech. Trends Cogn. Sci. 6, 37–46 (2002).

    Article  PubMed  Google Scholar 

  58. Poeppel, D. The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun. 41, 245–255 (2003).

    Article  Google Scholar 

  59. Hamilton, L. S., Oganian, Y., Hall, J. & Chang, E. F. Parallel and distributed encoding of speech across human auditory cortex. Cell 184, 4626–4639 (2021).

    Article  CAS  PubMed  Google Scholar 

  60. Nourski, K. V. et al. Functional organization of human auditory cortex: investigation of response latencies through direct recordings. NeuroImage 101, 598–609 (2014).

  61. Bartlett, E. L. The organization and physiology of the auditory thalamus and its role in processing acoustic features important for speech perception. Brain Lang. 126, 29–48 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Gattass, R., Gross, C. G. & Sandell, J. H. Visual topography of V2 in the macaque. J. Comp. Neurol. 201, 519–539 (1981).

    Article  CAS  PubMed  Google Scholar 

  63. Dumoulin, S. O. & Wandell, B. A. Population receptive field estimates in human visual cortex. Neuroimage 39, 647–660 (2008).

    Article  PubMed  Google Scholar 

  64. Ding, N., Melloni, L., Zhang, H., Tian, X. & Poeppel, D. Cortical tracking of hierarchical linguistic structures in connected speech. Nat. Neurosci. 19, 158–164 (2016).

    Article  CAS  PubMed  Google Scholar 

  65. Suied, C., Agus, T. R., Thorpe, S. J., Mesgarani, N. & Pressnitzer, D. Auditory gist: recognition of very short sounds from timbre cues. J. Acoust. Soc. Am. 135, 1380–1391 (2014).

    Article  PubMed  Google Scholar 

  66. Donhauser, P. W. & Baillet, S. Two distinct neural timescales for predictive speech processing. Neuron 105, 385–393 (2020).

    Article  CAS  PubMed  Google Scholar 

  67. Ulanovsky, N., Las, L., Farkas, D. & Nelken, I. Multiple time scales of adaptation in auditory cortex neurons. J. Neurosci. 24, 10440–10453 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Lu, K. et al. Implicit memory for complex sounds in higher auditory cortex of the ferret. J. Neurosci. 38, 9955–9966 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Chew, S. J., Mello, C., Nottebohm, F., Jarvis, E. & Vicario, D. S. Decrements in auditory responses to a repeated conspecific song are long-lasting and require two periods of protein synthesis in the songbird forebrain. Proc. Natl Acad. Sci. USA 92, 3406–3410 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Bianco, R. et al. Long-term implicit memory for sequential auditory patterns in humans. eLife 9, e56073 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Miller, K. J., Honey, C. J., Hermes, D., Rao, R. P. & Ojemann, J. G. Broadband changes in the cortical surface potential track activation of functionally diverse neuronal populations. Neuroimage 85, 711–720 (2014).

    Article  PubMed  Google Scholar 

  72. Leszczyński, M. et al. Dissociation of broadband high-frequency activity and neuronal firing in the neocortex. Sci. Adv. 6, eabb0977 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Günel, B., Thiel, C. M. & Hildebrandt, K. J. Effects of exogenous auditory attention on temporal and spectral resolution. Front. Psychol. 9, 1984 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Norman-Haignere, S. V. et al. Pitch-responsive cortical regions in congenital amusia. J. Neurosci. 36, 2986–2994 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Norman-Haignere, S. et al. Intracranial recordings from human auditory cortex reveal a neural population selective for musical song. Preprint at bioRxiv (2020).

  76. Boebinger, D., Norman-Haignere, S. V., McDermott, J. H. & Kanwisher, N. Music-selective neural populations arise without musical training. J. Neurophysiol. 125, 2237–2263 (2021).

    Article  PubMed  Google Scholar 

  77. Morosan, P. et al. Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system. Neuroimage 13, 684–701 (2001).

    Article  CAS  PubMed  Google Scholar 

  78. Baumann, S., Petkov, C. I. & Griffiths, T. D. A unified framework for the organization of the primate auditory cortex. Front. Syst. Neurosci. 7, 11 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Barr, D. J., Levy, R., Scheepers, C. & Tily, H. J. Random effects structure for confirmatory hypothesis testing: keep it maximal. J. Mem. Lang. 68, 255–278 (2013).

    Article  Google Scholar 

  80. Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. lmerTest package: tests in linear mixed effects models. J. Stat. Softw. 82, 1–26 (2017).

    Article  Google Scholar 

  81. Gelman, A. & Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge Univ. Press, 2006).

  82. Schielzeth, H. et al. Robustness of linear mixed-effects models to violations of distributional assumptions. Methods Ecol. Evol. 11, 1141–1152 (2020).

    Article  Google Scholar 

  83. de Cheveigné, A. & Parra, L. C. Joint decorrelation, a versatile tool for multichannel data analysis. Neuroimage 98, 487–505 (2014).

    Article  PubMed  Google Scholar 

  84. Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).

  85. de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L. & Theunissen, F. E. The hierarchical cortical organization of human speech processing. J. Neurosci. 37, 6539–6557 (2017).

  86. Marquardt, D. W. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11, 431–441 (1963).

    Article  Google Scholar 

  87. Fisher, W. M. tsylb: NIST syllabification software, version 2 revised (1997).

Download references


We thank D. Maksumov, N. Agrawal, S. Montenegro, L. Yu, M. Leszczynski and I. Tal for help with data collection, S. Montenegro and H. Wang for help in localizing electrodes and A. Kell, S. David, J. McDermott, B. Conway, N. Kanwisher, N. Kriegeskorte and M. Leszczynski for comments on an earlier draft of this manuscript. This study was supported by the National Institutes of Health (NIDCD-K99-DC018051 to S.V.N.-H., NIDCD-R01-DC014279 to N.M., S10 OD018211 to N.M., NINDS-R01-NS084142 to C.A.S. and NIDCD-R01-DC018805 to N.M./A.F.) and the Howard Hughes Medical Institute (LSRF postdoctoral award to S.V.N.-H.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations



S.V.N.-H., L.K.L., I.I. and E.M.M. collected data for the experiments described in this manuscript. O.D., W.D., N.A.F., G.M.K. and C.A.S. collectively planned, coordinated and executed the neurosurgical electrode implantation needed for intracranial monitoring. S.V.N.-H. performed the analyses with help from LL and designed the TCI method and model. S.V.N.-H., A.F. and N.M. designed the experiment. S.V.N.-H., A.F. and N.M. wrote the paper.

Corresponding authors

Correspondence to Sam V. Norman-Haignere or Nima Mesgarani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Jérémy Giroud, Jonas Obleser, Benjamin Morillon and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Histogram of phoneme, syllable, and word durations in TIMIT.

Durations of phonemes, multi-phoneme syllables, and multi-syllable words in the commonly used TIMIT database. Phonemes and words are labeled in the database. Syllables were computed from the phoneme labels using the software tsylb287. The median duration for each structure is 64, 197, and 479 milliseconds, respectively.

Extended Data Fig. 2 Cross-context correlation for 20 representative electrodes.

Electrodes were selected to illustrate the diversity of integration windows. Specifically, we partitioned all sound-responsive electrodes into 5 groups based on the width of their integration window, estimated using a model (Fig. 3 illustrates the model). For each group, we plot the four electrodes with the highest SNR (as measured by the test-retest correlation across the sound set). Electrodes have been sorted by their integration width, which is indicated to the right of each plot, along with the location, hemisphere and subject number for each electrode. Each plot shows the cross-context correlation and noise ceiling for a single electrode and segment duration (indicated above each column). There were more segments for the shorter durations, and as a consequence, the cross-context correlation and noise ceiling were more stable/reliable for shorter segments (the number of segments was inversely proportional to the duration). This property is useful because at the short segment durations, there are a smaller number of relevant time lags, and it is useful if those lags are more reliable. The model used to estimate integration windows pooled across all lags and segment durations, taking into account the reliability of each datapoint.

Extended Data Fig. 3 Simulation results.

a, Integration windows estimated from four different model responses (from top to bottom): (1) a model that integrated waveform magnitudes within a known window (2) a model that integrated energy within a cochlear frequency band (3) a model that integrated spectrotemporal energy in a cochleagram representation of sound (4) a simple, deep neural network. All models had a ground truth, Gamma-distributed integration window. We independently varied the width and centre of the integration window (excluding non-causal combinations) and tested if we could infer the ground truth values. Results are shown for several different SNRs, as measured by the test-retest correlation of the response across repetitions, the same metric used to select electrodes (we selected electrodes with a test-retest correlation greater than 0.1). Black dots correspond to a single model window/simulation. Red dots show the median estimate across all windows/simulations. Some models included more variants (for example different spectrotemporal filters), which is why some plots have a higher dot density. There is a small upward bias for very narrow integration widths (31 ms), probably due to the effects of the filter used to measure broadband gamma, which has an integration width of ~19 milliseconds. The integration widths of our electrodes (~50 to 400 ms) were mostly above the point at which this bias would have a substantial effect, and the bias works against our observed results since it compresses the possible range of integration widths. b, Integration windows estimated without explicitly modeling and accounting for boundary effects. Results are shown for the spectrotemporal model, which produces strong responses at the boundary between two segments due to prominent spectrotemporal changes. Note there is a nontrivial upward bias, particularly for integration widths, when not accounting for boundary effects (see Methods for a more detailed discussion). c, Integration windows estimated without accounting for an upward bias in the squared error loss. The bias grows as the SNR decreases (see Methods for an explanation). Results are shown for the waveform amplitude model, but the bias is present for all models since it is caused by the loss. Our bias-corrected loss largely corrected the problem, as can be observed in panel a.

Extended Data Fig. 4 Integration windows for different electrode types and subjects.

a, This panel plots integration widths (left) and centres (right) for individual electrodes as a function of distance to primary auditory cortex, defined as posteromedial Heschl’s gyrus. The electrodes have been labeled by their type (grid, depth, strip). The grid/strip electrodes were located further from primary auditory cortex on average, but given their location did not show any obvious difference in integration properties. The effect of distance was significant for the depth electrodes alone (the most numerous type of electrode) when excluding grids and strips (width: F1,14.53 = 24.51, p < 0.001, βdistance = 0.065 octaves/mm, CI = [0.039,0.090]; centre: F1,12.83 = 27.76, p < 0.001, βdistance = 0.052 octaves/mm, CI = [0.032,0.071], N = 114 electrodes). To be conservative, electrode type was included as a covariate in the linear mixed effects model used to assess significance as a whole. b, Same as panel a but indicating subject membership instead of electrode type. Each symbol corresponds to a unique subject. The effect of distance on integration windows is broadly distributed across the 18 subjects.

Extended Data Fig. 5 Robustness analyses.

a, Sound segments were excerpted from 10 sounds. This panel shows integration windows estimated using segments drawn from two non-overlapping splits of 5 sounds each (listed on the left). Since many non-primary regions only respond strongly to speech or music8,9,11, we included speech and music in both splits. Format is analogous to Fig. 4 but only showing integration widths (integration centres were also similar across analysis variants). The effect of distance was significant for both splits (split1: F1,12.660 = 40.20, p < 0.001, βdistance = 0.069 octaves/mm, CI = [0.047,0.090], N = 136 electrodes; split 2: F1,21.66 = 30.11, p < 0.001, βdistance = 0.066 octaves/mm, CI = [0.043,0.090], N = 135 electrodes). b, Shorter segments were created by subdividing longer segments, which made it possible to consider two types of context (see schematic): (1) random context, in which each segment is surrounded by random other segments (2) natural context, where a segment is a subset of a longer segment and thus surrounded by its natural context. When comparing responses across contexts, one of the two contexts must be random so that the contexts differ, but the other context can be random or natural. Our main analyses pooled across both types of comparison. Here, we show integration widths estimated by comparing either purely random contexts (top panel) or comparing random and natural contexts (bottom panel). The effect of distance was significant for both types of context comparisons (random-random: F1,28.056 = 30.01, p < 0.001, βdistance = 0.064 octaves/mm, CI = [0.041,0.087], N = 121 electrodes; random-natural: F1,18.816 = 27.087, p < 0.001, βdistance = 0.062 octaves/mm, CI = [0.039,0.086], N = 154 electrodes). c, We modeled integration windows using window shapes that varied from more exponential to more Gaussian (the parameter γ in equations 2 and 3 controls the shape of the window, see Methods). For our main analysis, we selected the shape that yielded the best prediction for each electrode. This panel shows integration widths estimated using two different fixed shapes. The effect of distance was significant for both shapes (γ = 1: F1,21.712 = 24.85, p < 0.001, βdistance = 0.067 octaves/mm, CI = [0.040,0.093], N = 154 electrodes; γ = 4: F1,20.973 = 19.38, p < 0.001, βdistance = 0.055 octaves/mm, CI = [0.031,0.080], N = 154 electrodes). d, Similar results were obtained using two different frequency ranges to measure gamma power (70–100 s Hz: F1,21.05 = 19.38, p < 0.001, βdistance = 0.058 octaves/mm, CI = [0.032,0.083], N = 133 electrodes; 100–140 Hz: F1,20.56 = 12.57, p < 0.01, βdistance = 0.051 octaves/mm, CI = [0.023,0.080], N = 131 electrodes).

Extended Data Fig. 6 Relationship between integration widths and centres without any causality constraint.

This figure plots integration centres vs. widths for windows that were not explicitly constrained to be causal. Results were similar to those with an explicit causality constraint (Fig. 4c). Same format as Fig. 4c.

Extended Data Fig. 7 Components most selective for sound categories at different integration widths.

Electrodes were subdivided into three equally sized groups based on the width of their integration window. The time-averaged response of each electrode was then projected onto the top 2 components that showed the greatest category selectivity, measured using linear discriminant analysis (each circle corresponds to a unique sound). Same format as Fig. 5b, which plots responses projected onto the top 2 principal components. Half of the sounds were used to compute the components, and the other half were used to measure their response to avoid statistical circularity. As a consequence, there are half as many sounds as in Fig. 5b.

Extended Data Fig. 8 Results for integration-matched responses.

a, For our functional selectivity analyses, we subdivided the electrodes into three equally sized groups, based on the width of their integration window. To test if our results were an inevitable consequence of differences in temporal integration, we matched the integration windows across the electrodes in each group. Matching was performed by integrating the responses from the electrodes in the short and intermediate groups within an appropriately chosen window, such that the resulting integration window matched those for the longest group (see Integration matching in Methods). This figure plots a histogram of the effective integration windows after matching. b-d, These panels show the results of our applying our functional selectivity analyses to integration-matched responses. Format is the same as Fig. 5b-d.

Supplementary information

Source data

Source Data Fig. 4

Integration widths and centres for all electrodes along with their distance to primary auditory cortex and relevant metadata (that is, hemisphere, subject ID and electrode type).

Source Data Fig. 5

Principal component loadings plotted in Fig. 5b. Prediction accuracies for acoustic features and category labels for all electrodes along with relevant metadata (that is, hemisphere, subject ID, electrode type and reliability ceiling).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Norman-Haignere, S.V., Long, L.K., Devinsky, O. et al. Multiscale temporal integration organizes hierarchical computation in human auditory cortex. Nat Hum Behav 6, 455–469 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing