Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts

Abstract

Speech contains temporal structure that the brain must analyze to enable linguistic processing. To investigate the neural basis of this analysis, we used sound quilts, stimuli constructed by shuffling segments of a natural sound, approximately preserving its properties on short timescales while disrupting them on longer scales. We generated quilts from foreign speech to eliminate language cues and manipulated the extent of natural acoustic structure by varying the segment length. Using functional magnetic resonance imaging, we identified bilateral regions of the superior temporal sulcus (STS) whose responses varied with segment length. This effect was absent in primary auditory cortex and did not occur for quilts made from other natural sounds or acoustically matched synthetic sounds, suggesting tuning to speech-specific spectrotemporal structure. When examined parametrically, the STS response increased with segment length up to 500 ms. Our results identify a locus of speech analysis in human auditory cortex that is distinct from lexical, semantic or syntactic processes.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Schematic of the quilting algorithm and example stimuli.
Figure 2: Extent and location of ROIs.
Figure 3: Responses to German speech quilts as a function of segment length, in four ROIs: HG (red), PT (blue), group fROI (green) and individual fROIs (black), shown separately for the two hemispheres.
Figure 4: Responses to modulation control stimuli.
Figure 5: Responses to environmental sound quilts.
Figure 6: Responses to noise-vocoded speech quilts.
Figure 7: Naturalness ratings and responses to compressed speech quilts.
Figure 8: Functional ROIs revealed by a parcellation algorithm29.

Similar content being viewed by others

References

  1. Stevens, K.N. Acoustic Phonetics (MIT Press, 2000).

  2. Poeppel, D., Idsardi, W.J. & van Wassenhove, V. Speech perception at the interface of neurobiology and linguistics. Phil. Trans. R. Soc. Lond. B 363, 1071–1086 (2008).

    Article  Google Scholar 

  3. Scott, S.K., Blank, C.C., Rosen, S. & Wise, R.J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain 123, 2400–2406 (2000).

    Article  PubMed  Google Scholar 

  4. Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007).

    Article  CAS  PubMed  Google Scholar 

  5. Rauschecker, J.P. & Scott, S.K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat. Neurosci. 12, 718–724 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Binder, J.R. et al. Human temporal lobe activation by speech and non-speech sounds. Cereb. Cortex 10, 512–528 (2000).

    Article  CAS  PubMed  Google Scholar 

  7. Liebenthal, E., Binder, J.R., Spitzer, S.M., Possing, E.T. & Medler, D.A. Neural substrates of phonemic perception. Cereb. Cortex 15, 1621–1631 (2005).

    Article  PubMed  Google Scholar 

  8. Obleser, J., Zimmermann, J., Van Meter, J. & Rauschecker, J.P. Multiple stages of auditory speech perception reflected in event-related fMRI. Cereb. Cortex 17, 2251–2257 (2007).

    Article  PubMed  Google Scholar 

  9. Wild, C.J., Davis, M.H. & Johnsrude, I.S. Human auditory cortex is sensitive to the perceived clarity of speech. Neuroimage 60, 1490–1502 (2012).

    Article  PubMed  Google Scholar 

  10. Giraud, A.L. et al. Contributions of sensory input, auditory search and verbal comprehension to cortical activity during speech processing. Cereb. Cortex 14, 247–255 (2004).

    Article  CAS  PubMed  Google Scholar 

  11. Obleser, J., Eisner, F. & Kotz, S.A. Bilateral speech comprehension reflects differential sensitivity to spectral and temporal features. J. Neurosci. 28, 8116–8123 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Zatorre, R.J. & Belin, P. Spectral and temporal processing in human auditory cortex. Cereb. Cortex 11, 946–953 (2001).

    Article  CAS  PubMed  Google Scholar 

  13. Schönwiesner, M., Rübsamen, R. & von Cramon, D.Y. Hemispheric asymmetry for spectral and temporal processing in the human antero-lateral auditory belt cortex. Eur. J. Neurosci. 22, 1521–1528 (2005).

    Article  PubMed  Google Scholar 

  14. Boemio, A., Fromm, S., Braun, A. & Poeppel, D. Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nat. Neurosci. 8, 389–395 (2005).

    Article  CAS  PubMed  Google Scholar 

  15. Overath, T., Kumar, S., von Kriegstein, K. & Griffiths, T.D. Encoding of spectral correlation over time in auditory cortex. J. Neurosci. 28, 13268–13273 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Overath, T., Zhang, Y., Sanes, D.H. & Poeppel, D. Sensitivity to temporal modulation rate and spectral bandwidth in the human auditory system: fMRI evidence. J. Neurophysiol. 107, 2042–2056 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Greenberg, S. A multi-tier framework for understanding spoken language. in Listening to Speech: An Auditory Perspective (eds. S. Greenberg & W.A. Ainsworth) 411–433 (Lawrence Erlbaum, 2006).

  18. Rosen, S. Temporal information in speech: acoustic, auditory and linguistic aspects. Phil. Trans. R. Soc. Lond. B 336, 367–373 (1992).

    Article  CAS  Google Scholar 

  19. Efros, A.A. & Leung, T.K. Texture synthesis by non-parametric sampling. in IEEE Int. Conf. Comp. Vis. 1033–1038 (1999).

  20. Grill-Spector, K. et al. A sequence of object-processing stages revealed by fMRI in the human occipital lobe. Hum. Brain Mapp. 6, 316–328 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Lerner, Y., Honey, C.J., Silbert, L.J. & Hasson, U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Pallier, C., Devauchelle, A.-D. & Dehaene, S. Cortical representation of the constituent structure of sentences. Proc. Natl. Acad. Sci. USA 108, 2522–2527 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Abrams, D.A. et al. Decoding temporal structure in music and speech relies on shared brain resources but elicits different fine-scale spatial patterns. Cereb. Cortex 21, 1507–1518 (2011).

    Article  PubMed  Google Scholar 

  24. Giraud, A.L. et al. Representation of the temporal envelope of sounds in the human brain. J. Neurophysiol. 84, 1588–1598 (2000).

    Article  CAS  PubMed  Google Scholar 

  25. Harms, M.P., Guinan, J.J., Sigalovsky, I.S. & Melcher, J.R. Short-term sound temporal envelope characteristics determine multisecond time patterns of activity in human auditory cortex as shown by fMRI. J. Neurophysiol. 93, 210–222 (2005).

    Article  PubMed  Google Scholar 

  26. McDermott, J.H. & Simoncelli, E.P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Shannon, R.V., Zeng, F.G., Kamath, V., Wygonski, J. & Ekelid, M. Speech recognition with primarily temporal cues. Science 270, 303–304 (1995).

    Article  CAS  PubMed  Google Scholar 

  28. Davis, M.H. & Johnsrude, I. Hierarchical processing in spoken language comprehension. J. Neurosci. 23, 3423–3431 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Fedorenko, E., Hsieh, P.J., Nieto-Castanon, A., Whitfield-Gabrieli, S. & Kanwisher, N. New method for fMRI investigations of language: defining ROIs functionally in individual subjects. J. Neurophysiol. 104, 1177–1194 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Lashkari, D., Vul, E., Kanwisher, N.G. & Golland, P. Discovering structure in the space of fMRI selectivity profiles. Neuroimage 50, 1085–1098 (2010).

    Article  PubMed  Google Scholar 

  31. Formisano, E., De Martino, F., Bonte, M. & Goebel, R. “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322, 970–973 (2008).

    Article  CAS  PubMed  Google Scholar 

  32. Mesgarani, N., Cheung, C., Johnson, K. & Chang, E.F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kanwisher, N., McDermott, J. & Chun, M.M. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Price, C., Thierry, G. & Griffiths, T. Speech-specific auditory processing: where is it? Trends Cogn. Sci. 9, 271–276 (2005).

    Article  PubMed  Google Scholar 

  35. Schirmer, A., Fox, M.P. & Grandjean, D. On the spatial organization of sound processing in the human temporal lobe: a meta-analysis. Neuroimage 63, 137–147 (2012).

    Article  PubMed  Google Scholar 

  36. Ghitza, O. On the role of theta-driven syllabic parsing in decoding speech: intelligibility of speech with a manipulated modulation spectrum. Front. Psychol. 3, 238 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Rauschecker, J.P. Cortical processing of complex sounds. Curr. Opin. Neurobiol. 8, 516–521 (1998).

    Article  CAS  PubMed  Google Scholar 

  38. Norman-Haignere, S., Kanwisher, N. & McDermott, J.H. Cortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex. J. Neurosci. 33, 19451–19469 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P. & Pike, B. Voice-selective areas in human auditory cortex. Nature 403, 309 (2000).

    Article  CAS  PubMed  Google Scholar 

  40. Liebenthal, E., Desai, R.H., Humphries, C., Sabri, M. & Desai, A. The functional organization of the left STS: a large scale meta-analysis of PET and fMRI studies of healthy adults. Front. Neurosci. 8, 289 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Peelle, J.E. The hemispheric lateralization of speech processing depends on what “speech” is: a hierarchical perspective. Front. Hum. Neurosci. 6, 309 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Cogan, G.B. et al. Sensory-motor transformations for speech occur bilaterally. Nature 507, 94–98 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. McGettigan, C. et al. An application of univariate and multivariate approaches in FMRI to quantifying the hemispheric lateralization of acoustic and linguistic processes. J. Cogn. Neurosci. 24, 636–652 (2012).

    Article  PubMed  Google Scholar 

  44. Voss, R.F. & Clarke, J. 1/f noise in music and speech. Nature 258, 317–318 (1975).

    Article  Google Scholar 

  45. Attias, H. & Schreiner, C.E. Temporal low-order statistics of natural sounds. in Advances in Neural Information Processing Systems, Vol. 9 (eds. M.C. Mozer, M.J. Jordan, & T. Petsche) 27–33 (MIT Press, 1997).

  46. Meyer, M., Alter, K., Friederici, A.D., Lohmann, G. & von Cramon, D.Y. fMRI reveals brain regions mediating slow prosodic modulations in spoken sentences. Hum. Brain Mapp. 17, 73–88 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Humphries, C., Sabri, M., Lewis, K. & Liebenthal, E. Hierarchical organization of speech perception in human auditory cortex. Front. Neurosci. 8, 406 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Turken, A.U. & Dronkers, N.F. The neural architecture of the language comprehension network: converging evidence from lesion and connectivity analyses. Front. Syst. Neurosci. 5, 1 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Lau, E.F., Phillips, C. & Poeppel, D. A cortical network for semantics: (de)constructing the N400. Nat. Rev. Neurosci. 9, 920–933 (2008).

    Article  CAS  PubMed  Google Scholar 

  50. Petkov, C.I., Logothetis, N. & Obleser, J. Where are the human speech and voice regions, and do other animals have anything like them? Neuroscientist 15, 419–429 (2009).

    Article  PubMed  Google Scholar 

  51. Desmond, J.E. & Glover, G.H. Estimating sample size in functional MRI (fMRI) neuroimaging studies: Statistical power analyses. J. Neurosci. Methods 118, 115–128 (2002).

    Article  PubMed  Google Scholar 

  52. Moulines, E. & Charpentier, F. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9, 453–467 (1990).

    Article  Google Scholar 

  53. Brainard, D.H. The psychophysics toolbox. Spat. Vis. 10, 433–436 (1997).

    Article  CAS  PubMed  Google Scholar 

  54. Friston, K.J. et al. Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2, 189–210 (1995).

    Article  Google Scholar 

  55. Rademacher, J. et al. Probabilistic mapping and volume measurement of human primary auditory cortex. Neuroimage 13, 669–683 (2001).

    Article  CAS  PubMed  Google Scholar 

  56. Westbury, C.F., Zatorre, R.J. & Evans, A.C. Quantifying variability in the planum temporale: a probability map. Cereb. Cortex 9, 392–405 (1999).

    Article  CAS  PubMed  Google Scholar 

  57. Brett, M., Anton, J.-L., Valabregue, R. & Poline, J.-B. Region of interest analysis using an SPM toolbox (abstract). Neuroimage 16 (suppl. 2), (2002).

Download references

Acknowledgements

The authors thank K. Doelling for assistance with data collection, G. Lewis for extensive help with visualization of the results using FreeSurfer, E. Fedorenko for assistance with the parcellation algorithm, D. Ellis for implementing the PSOLA algorithm for segment concatenation, the volunteers who kindly allowed us to record their speech, T. Schofield, N. Kanwisher and J. Golomb for helpful discussions, and N. Ding, A.-L. Giraud, E. Fedorenko, S. Norman-Haignere and J. Simon for helpful comments on earlier drafts of the manuscript. This work was supported by US National Institutes of Health grant 2R01DC05660 to D.P., a GRAMMY Foundation Research Grant to J.M.Z., and a McDonnell Scholar Award to J.H.M.

Author information

Authors and Affiliations

Authors

Contributions

T.O., J.H.M., J.M.Z. and D.P. designed the experiments, interpreted the data and wrote the manuscript. J.H.M. designed the quilting algorithm and generated the stimuli. T.O. and J.M.Z. acquired the fMRI data. J.H.M. acquired the behavioral data. T.O. and J.H.M. analyzed the data.

Corresponding authors

Correspondence to Tobias Overath or Josh H McDermott.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Glassbrain projection of the group L30 > L960 functional localizer contrast.

The results are shown at a statistical significance threshold of p < 0.001 (uncorrected for multiple comparisons).

Supplementary Figure 2 Frequency power spectra for speech and modulation control sounds.

Frequency power spectra for speech (solid) and modulation control sounds (dashed) quilted with 30 ms (red) and 960 ms (blue) segment durations.

Supplementary Figure 3 Replicability across scanning sessions.

Replicability across scanning sessions for 12 participants who were scanned between two and four times. The graphs plot the BOLD response normalized by the response to the 960 ms functional localizer condition in each participant’s individual fROI for the right and left hemispheric individual fROIs (red and blue, respectively) in the individual scanning sessions (dashed, dashed-dotted, dotted) that included the original speech quilts (i.e. not compressed speech quilts). The solid line plots the average across scanning sessions. Note that the majority of participants exhibits a plateau at around 480 ms segment duration.

Supplementary Figure 4 Clustering analysis of voxel response profiles30.

a) A permutation test revealed that only the top-ranked cluster (out of nine) was statistically significant (red line segment). b) Profile of the top-ranked discovered cluster for the eight experimental conditions (L30, L960, S30 to S960). Note that the data were mean-centered, which is why the response profile is negative for the conditions yielding a low response. c) Rendering of this cluster on coronal cross-sections of our participants' average structural images (y = -38, -30, -22, -14, -6, 2, 10), thresholded at a voxel by functional system assignment probability of r ≥ 0.7.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4 and Supplementary Tables 1 and 2 (PDF 3438 kb)

Supplementary Methods Checklist (PDF 130 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Overath, T., McDermott, J., Zarate, J. et al. The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts. Nat Neurosci 18, 903–911 (2015). https://doi.org/10.1038/nn.4021

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nn.4021

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing