Cortical oscillations and speech processing: emerging computational principles and operations

Journal name:
Nature Neuroscience
Year published:
Published online


Neuronal oscillations are ubiquitous in the brain and may contribute to cognition in several ways: for example, by segregating information and organizing spike timing. Recent data show that delta, theta and gamma oscillations are specifically engaged by the multi-timescale, quasi-rhythmic properties of speech and can track its dynamics. We argue that they are foundational in speech and language processing, 'packaging' incoming information into units of the appropriate temporal granularity. Such stimulus-brain alignment arguably results from auditory and motor tuning throughout the evolution of speech and language and constitutes a natural model system allowing auditory research to make a unique contribution to the issue of how neural oscillatory activity affects human cognition.

At a glance


  1. A theory of early oscillation-based operations in speech perception.
    Figure 1: A theory of early oscillation-based operations in speech perception.

    Five operations allow connected speech to be parsed by cortical theta and gamma oscillations. We assume a high-resolution spectro-temporal representation of speech in primary auditory cortex. We represent a typical spike train in layer IV cortical neurons. Most of these neurons phase-lock to speech amplitude modulations. Response onset elicits a reset of theta oscillations in superficial layers (step 1) where auditory cortex output is generated. After reset, theta oscillations track the speech envelope (step 2). Theta reset induces a transient pause in gamma activity and a subsequent reset of gamma oscillations. Theta and gamma generators that are weakly coupled at rest become more strongly coupled and nested (step 3). Gamma power controls the excitability of neurons generating the feedforward signal from A1 to higher order areas (step 4). Neuronal excitability phase aligns to speech modulations (step 5): gamma tends to be strong when the energy in the signal is weak.

  2. Speech-brain interaction from human intracortical recordings of primary auditory cortex.
    Figure 2: Speech–brain interaction from human intracortical recordings of primary auditory cortex.

    (a) Time–frequency representation of cortical activity at rest. (b) Time–frequency representation of cortical activity in response to the French spoken sentence “Le nouveau garde la porte.” (c) Stimulus spectrogram, which shows spectro-temporal modulations and formant structure. (d) An example modulation spectrum extracted from a band centered around 3 kHz (bandwidth 0.5 kHz). To cross-correlate speech with the brain response, the broadband speech spectrum (1–5 kHz) was split into frequency bands (32 channels) from which the temporal envelope in the 1–140 Hz modulation range is extracted. In the band shown, modulations cover this entire range. (e) Auditory cortex power strongly correlates with speech modulations in two frequency bands, theta and gamma. The theta band aligns to speech with zero time lag; the gamma band reflects speech modulations after a 40-ms time lag. (f) An index of inter-trial phase consistency, which reflects frequency-specific locking between stimulus and brain. The cross-correlation between index and stimulus is an indicator of how oscillations phase-track speech amplitude modulations. White box, theta-gamma frequency nesting. These data provide experimental confirmation from human auditory cortex for the three first proposed operations (steps 1 to 3 in Fig. 1). SEEG, stereotactic EEG. Data courtesy of C. Liégeois-Chauvel, analyzed by B. Morillon, Y. Beigneux, L. Arnal, C. Bénar, C. Liégeois-Chauvel and A.-L.G.

  3. Generation of oscillations in a cortical column.
    Figure 3: Generation of oscillations in a cortical column.

    (a) Schematic distribution of oscillatory and stimulus-driven spiking activity in a cortical column (courtesy M. Oberlaender, Max Planck Institute, Florida; modified from ref. 49). Oscillatory activity is typically detected in the superficial (II/III) and deep (V/VI) cortical layers, whereas stimulus-driven spiking is strongest in layer IV (right). (b) Cortical column networks that could underpin the operations depicted in Figure 1. We assume two populations of pyramidal neurons in superficial layers, one involved in low gamma generation, the other in theta generation. These populations are connected through an excitatory connection from theta to gamma (details in Fig. 4). Under the cumulative influence of theta and gamma oscillations, the spike train–reflecting activity in input layer IV is transformed in a discontinuous spike train in the superficial layer, which will be read out by the next hierarchical stage.

  4. Comparison of neural responses in auditory primary and association (Brodmann area 22) cortices.
    Figure 4: Comparison of neural responses in auditory primary and association (Brodmann area 22) cortices.

    (a,b) Time–frequency representations obtained from recordings made with stereotactic EEG in humans (see also Fig. 2) in response to a spoken sentence. (c,d) Theta phase–gamma power nesting. Although gamma power is stronger in association (lower panels) than in primary (upper panels) auditory cortex, it only tracks fast stimulus modulations in the primary region. Yet theta-gamma nesting (white box) is detectable in both areas, suggesting that gamma activity is controlled by the stimulus in primary auditory cortex but controlled by theta activity in the association area. Note that theta tracking is also slower in the association area, supporting the notion of downsampling when progressing in the auditory cortical hierarchy.

  5. A biophysical model of coupled theta and gamma oscillations.
    Figure 5: A biophysical model of coupled theta and gamma oscillations.

    (a) The model uses pyramidal-interneuron gamma (PING) and pyramidal-interneuron theta (PINT) networks, whereby oscillations at both frequencies are generated by the interaction between a pyramidal excitatory (exc.) population and an inhibitory (inh.) population. (b) Rastergram of the simulated network in response to an English sentence filtered through precortical auditory pathways50; the input corresponds to one channel centered on 1.5 kHz. The network exhibits intrinsic gamma and theta activities before the onset of the sentence, and gamma oscillations are modulated by theta rhythms26. The PINT generator phase-locks to the onset of slow modulations (5–10 Hz) in the speech signal, signaling syllables13. The PINT network is connected to the PING one by an excitatory connection. The input–output and gamma parts of the network are similar to those in ref. 27. The response of outputs cells constitutes a binary (three-bits) code reflecting the shape of the speech envelope. Theta excitatory neurons (Te), dark green; theta inhibitory neurons (Ti), light green; gamma excitatory neurons (Ge), dark blue; gamma inhibitory neurons (Gi), light blue; output neurons (Out), black. The input (In) is plotted unscaled in red. The network is composed of 5 Te, 5 Ti, 60 Ge, 20 Gi and 25 output neurons, modeled as leaky integrate-and-fire neurons, with Ge and output neurons having an extra m-current27. Synaptic release includes both synaptic rise and decay time constants. (c) Averaged oscillatory activity: theta activity phase-locks to the stimulus, and gamma activity follows speech envelope and theta activity. Model development and simulations by A. Hyafil, B. Gutkin, L. Fontolan, O. Ghitza and A.-L.G.

  6. Functional anatomy of the speech processing network.
    Figure 6: Functional anatomy of the speech processing network.

    (a) Anatomy as adapted from ref. 33, deriving largely from mapping of cortical oscillations. Stronger correlations between regions (thick arrows, P ≤ 0.01) reflect stronger coupling between oscillatory activity in tested frequency bands and the BOLD response. (b) Anatomy as adapted from ref. 44, deriving largely from imaging and lesion–deficit data. Dotted lines illustrate the putative connectivity in the dorsal and ventral processing streams. Regions in the same color indicate areas implicated in oscillatory33 (a) or imaging and lesion44 (b) analyses. A1, primary auditory cortex; S2, secondary somatosensory cortex; BA40, Brodmann area 40 (supramarginal gyrus); STS, superior temporal sulcus; MTG, middle temporal gyrus; IFG, inferior frontal gyrus; PMC, premotor cortex; AT, anterior temporal cortex; AC, auditory cortex; SPT, sylvian parieto-temporal area; ITG, inferior temporal gyrus.


  1. Heimbauer, L.A., Beran, M.J. & Owren, M.J. A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content. Curr. Biol. 21, 12101214 (2011).
  2. Liberman, A.M. & Mattingly, I.G. The motor theory of speech perception revised. Cognition 21, 136 (1985).
  3. Poeppel, D. The analysis of speech in different temporal integration windows: cerebral lateralization as 'asymmetric sampling in time'. Speech Commun. 41, 245255 (2003).
  4. Shannon, R.V. et al. Speech recognition with primarily temporal cues. Science 270, 303304 (1995).
  5. Lorenzi, C. et al. Speech perception problems of the hearing impaired reflect inability to use temporal fine structure. Proc. Natl. Acad. Sci. USA 103, 1886618869 (2006).
  6. Adank, P. & Janse, E. Perceptual learning of time-compressed and natural fast speech. J. Acoust. Soc. Am. 126, 26492659 (2009).
  7. Giraud, A.L. et al. Endogenous cortical rhythms determine cerebral specialization for speech perception and production. Neuron 56, 11271134 (2007).
  8. Ghitza, O. Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm. Front. Psychol. 2, 130 (2011).
  9. Ghitza, O. & Greenberg, S. On the possible role of brain rhythms in speech perception: intelligibility of time-compressed speech with periodic and aperiodic insertions of silence. Phonetica 66, 113126 (2009).
  10. Liégeois-Chauvel, C. et al. Temporal envelope processing in the human left and right auditory cortices. Cereb. Cortex 14, 731740 (2004).
  11. Ding, N. & Simon, J.Z. Neural representations of complex temporal modulations in the human auditory cortex. J. Neurophysiol. 102, 27312743 (2009).
  12. Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 10011010 (2007).
  13. Ahissar, E. et al. Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proc. Natl. Acad. Sci. USA 98, 1336713372 (2001).
  14. Abrams, D.A. et al. Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech. J. Neurosci. 28, 39583965 (2008).
  15. Nourski, K.V. et al. Temporal envelope of time-compressed speech represented in the human auditory cortex. J. Neurosci. 29, 1556415574 (2009).
  16. Canolty, R.T. & Knight, R.T. The functional role of cross-frequency coupling. Trends Cogn. Sci. 14, 506515 (2010).
  17. Schroeder, C.E. & Lakatos, P. Low-frequency neuronal oscillations as instruments of sensory selection. Trends Neurosci. 32, 918 (2009).
  18. Atencio, C.A., Sharpee, T.O. & Schreiner, C.E. Cooperative nonlinearities in auditory cortical neurons. Neuron 58, 956966 (2008).
  19. Sakata, S. & Harris, K.D. Laminar structure of spontaneous and sensory-evoked population activity in auditory cortex. Neuron 64, 404418 (2009).
  20. Wang, X.J. Neurophysiological and computational principles of cortical rhythms in cognition. Physiol. Rev. 90, 11951268 (2010).
  21. Börgers, C., Epstein, S. & Kopell, N.J. Background gamma rhythmicity and attention in cortical local circuits: a computational study. Proc. Natl. Acad. Sci. USA 102, 70027007 (2005).
  22. Fries, P., Nikolic, D. & Singer, W. The gamma cycle. Trends Neurosci. 30, 309316 (2007).
  23. Kayser, C., Logothetis, N.K. & Panzeri, S. Millisecond encoding precision of auditory cortex neurons. Proc. Natl. Acad. Sci. USA 107, 1697616981 (2010).
  24. Chang, E.F. et al. Categorical speech representation in human superior temporal gyrus. Nat. Neurosci. 13, 14281432 (2010).
  25. Rauschecker, J.P. & Scott, S.K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat. Neurosci. 12, 718724 (2009).
  26. Kopell, N. et al. Gamma and theta rhythms in biophysical models of hippocampal circuits. in Hippocampal Microcircuits: A Computational Modeller's Resource Book (eds. Cutsuridis, V., Graham, B.P., Cobb, S. & Vida, I.) Ch 15 (Springer, 2011).
  27. Shamir, M. et al. Representation of time-varying stimuli by a network exhibiting oscillations on a faster time scale. PLoS Comput. Biol. 5, e1000370 (2009).
  28. Atencio, C.A. & Schreiner, C.E. Columnar connectivity and laminar processing in cat primary auditory cortex. PLoS ONE 5, e9521 (2010).
  29. Zatorre, R.J., Belin, P. & Penhune, V.B. Structure and function of auditory cortex: music and speech. Trends Cogn. Sci. 6, 3746 (2002).
  30. Boemio, A. et al. Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nat. Neurosci. 8, 389395 (2005).
  31. Jamison, H.L. et al. Hemispheric specialization for processing auditory nonspeech stimuli. Cereb. Cortex 16, 12661275 (2006).
  32. Obleser, J., Eisner, F. & Kotz, S.A. Bilateral speech comprehension reflects differential sensitivity to spectral and temporal features. J. Neurosci. 28, 81168123 (2008).
  33. Morillon, B. et al. Neurophysiological origin of human brain asymmetry for speech and language. Proc. Natl. Acad. Sci. USA 107, 1868818693 (2010).
  34. Telkemeyer, S. et al. Sensitivity of newborn auditory cortex to the temporal structure of sounds. J. Neurosci. 29, 1472614733 (2009).
  35. Hutsler, J. & Galuske, R.A. Hemispheric asymmetries in cerebral cortical networks. Trends Neurosci. 26, 429435 (2003).
  36. Gireesh, E.D. & Plenz, D. Neuronal avalanches organize as nested theta- and beta/gamma-oscillations during development of cortical layer 2/3. Proc. Natl. Acad. Sci. USA 105, 75767581 (2008).
  37. Pagnamenta, A.T. et al. Characterization of a family with rare deletions in CNTNAP5 and DOCK4 suggests novel risk loci for autism and dyslexia. Biol. Psychiatry 68, 320328 (2010).
  38. Peschansky, V.J. et al. The effect of variation in expression of the candidate dyslexia susceptibility gene homolog Kiaa0319 on neuronal migration and dendritic morphology in the rat. Cereb. Cortex 20, 884897 (2010).
  39. Wang, Y. et al. Dcdc2 knockout mice display exacerbated developmental disruptions following knockdown of doublecortin. Neuroscience 190, 398408 (2011).
  40. Goswami, U. A temporal sampling framework for developmental dyslexia. Trends Cogn. Sci. 15, 310 (2011).
  41. Ramus, F. & Szenkovits, G. What phonological deficit? Q. J. Exp. Psychol. (Hove) 61, 129141 (2008).
  42. Ziegler, J.C. et al. Speech-perception-in-noise deficits in dyslexia. Dev. Sci. 12, 732745 (2009).
  43. Lehongre, K. et al. Altered low-gamma sampling in auditory cortex accounts for the three main facets of dyslexia. Neuron 72, 10801090 (2011).
  44. Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393402 (2007).
  45. Holcombe, A.O. Seeing slow and seeing fast: two limits on perception. Trends Cogn. Sci. 13, 216221 (2009).
  46. Eliades, S.J. & Wang, X. Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature 453, 11021106 (2008).
  47. Chandrasekaran, C. et al. Monkeys and humans share a common computation for face/voice integration. PLoS Comput. Biol. 7, e1002165 (2011).
  48. Schroeder, C.E. et al. Dynamics of active sensing and perceptual selection. Curr. Opin. Neurobiol. 20, 172176 (2010).
  49. Oberlaender, M. et al. Cell type-specific three-dimensional structure of thalamocortical circuits in a column of rat vibrissal cortex. Cereb. Cortex doi:10.1093/cercor/bhr317 (16 November 2011).
  50. Chi, T., Ru, P. & Shamma, S.A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887906 (2005).

Download references

Author information


  1. Inserm U960, Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, France.

    • Anne-Lise Giraud
  2. Department of Psychology, New York University, New York, New York, USA.

    • David Poeppel

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Additional data