The most critical attribute of human language is its unbounded combinatorial nature: smaller elements can be combined into larger structures on the basis of a grammatical system, resulting in a hierarchy of linguistic units, such as words, phrases and sentences. Mentally parsing and representing such structures, however, poses challenges for speech comprehension. In speech, hierarchical linguistic structures do not have boundaries that are clearly defined by acoustic cues and must therefore be internally and incrementally constructed during comprehension. We found that, during listening to connected speech, cortical activity of different timescales concurrently tracked the time course of abstract linguistic structures at different hierarchical levels, such as words, phrases and sentences. Notably, the neural tracking of hierarchical linguistic structures was dissociated from the encoding of acoustic cues and from the predictability of incoming words. Our results indicate that a hierarchy of neural processing timescales underlies grammar-based internal construction of hierarchical linguistic structure.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Berwick, R.C., Friederici, A.D., Chomsky, N. & Bolhuis, J.J. Evolution, brain, and the nature of language. Trends Cogn. Sci. 17, 89–98 (2013).
Chomsky, N. Syntactic Structures (Mouton de Gruyter, 1957).
Phillips, C. Linear order and constituency. Linguist. Inq. 34, 37–90 (2003).
Bemis, D.K. & Pylkkänen, L. Basic linguistic composition recruits the left anterior temporal lobe and left angular gyrus during both listening and reading. Cereb. Cortex 23, 1859–1873 (2013).
Giraud, A.-L. & Poeppel, D. Cortical oscillations and speech processing: emerging computational principles and operations. Nat. Neurosci. 15, 511–517 (2012).
Sanders, L.D., Newport, E.L. & Neville, H.J. Segmenting nonsense: an event-related potential index of perceived onsets in continuous speech. Nat. Neurosci. 5, 700–703 (2002).
Bastiaansen, M., Magyari, L. & Hagoort, P. Syntactic unification operations are reflected in oscillatory dynamics during on-line sentence comprehension. J. Cogn. Neurosci. 22, 1333–1347 (2010).
Buiatti, M., Peña, M. & Dehaene-Lambertz, G. Investigating the neural correlates of continuous speech computation with frequency-tagged neuroelectric responses. Neuroimage 44, 509–519 (2009).
Pallier, C., Devauchelle, A.-D. & Dehaene, S. Cortical representation of the constituent structure of sentences. Proc. Natl. Acad. Sci. USA 108, 2522–2527 (2011).
Schroeder, C.E., Lakatos, P., Kajikawa, Y., Partan, S. & Puce, A. Neuronal oscillations and visual amplification of speech. Trends Cogn. Sci. 12, 106–113 (2008).
Buzsáki, G. Neural syntax: cell assemblies, synapsembles and readers. Neuron 68, 362–385 (2010).
Bernacchia, A., Seo, H., Lee, D. & Wang, X.-J. A reservoir of time constants for memory traces in cortical neurons. Nat. Neurosci. 14, 366–372 (2011).
Lerner, Y., Honey, C.J., Silbert, L.J. & Hasson, U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).
Kiebel, S.J., Daunizeau, J. & Friston, K.J. A hierarchy of time-scales and the brain. PLoS Comput. Biol. 4, e1000209 (2008).
Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 1001–1010 (2007).
Ding, N. & Simon, J.Z. Emergence of neural encoding of auditory objects while listening to competing speakers. Proc. Natl. Acad. Sci. USA 109, 11854–11859 (2012).
Zion Golumbic, E.M. et al. Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron 77, 980–991 (2013).
Peelle, J.E., Gross, J. & Davis, M.H. Phase-locked responses to speech in human auditory cortex are enhanced during comprehension. Cereb. Cortex 23, 1378–1387 (2013).
Pasley, B.N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012).
Steinhauer, K., Alter, K. & Friederici, A.D. Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nat. Neurosci. 2, 191–196 (1999).
Peña, M., Bonatti, L.L., Nespor, M. & Mehler, J. Signal-driven computations in speech processing. Science 298, 604–607 (2002).
Saffran, J.R., Aslin, R.N. & Newport, E.L. Statistical learning by 8-month-old infants. Science 274, 1926–1928 (1996).
Ray, S. & Maunsell, J.H. Different origins of gamma rhythm and high-gamma activity in macaque visual cortex. PLoS Biol. 9, e1000610 (2011).
Einevoll, G.T., Kayser, C., Logothetis, N.K. & Panzeri, S. Modeling and analysis of local field potentials for studying the function of cortical circuits. Nat. Rev. Neurosci. 14, 770–785 (2013).
Hagoort, P. & Indefrey, P. The neurobiology of language beyond single words. Annu. Rev. Neurosci. 37, 347–362 (2014).
Grodzinsky, Y. & Friederici, A.D. Neuroimaging of syntax and syntactic processing. Curr. Opin. Neurobiol. 16, 240–246 (2006).
Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007).
Friederici, A.D., Meyer, M. & von Cramon, D.Y. Auditory language comprehension: an event-related fMRI study on the processing of syntactic and lexical information. Brain Lang. 74, 289–300 (2000).
Canolty, R.T. et al. High gamma power is phase-locked to theta oscillations in human neocortex. Science 313, 1626–1628 (2006).
Lakatos, P. et al. An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. J. Neurophysiol. 94, 1904–1911 (2005).
Sirota, A., Csicsvari, J., Buhl, D. & Buzsáki, G. Communication between neocortex and hippocampus during sleep in rodents. Proc. Natl. Acad. Sci. USA 100, 2065–2069 (2003).
Arnal, L.H. & Giraud, A.-L. Cortical oscillations and sensory predictions. Trends Cogn. Sci. 16, 390–398 (2012).
Poeppel, D., Idsardi, W.J. & van Wassenhove, V. Speech perception at the interface of neurobiology and linguistics. Phil. Trans. R. Soc. Lond. B 363, 1071–1086 (2008).
Peña, M. & Melloni, L. Brain oscillations during spoken sentence processing. J. Cogn. Neurosci. 24, 1149–1164 (2012).
Gross, J. et al. Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS Biol. 11, e1001752 (2013).
Ding, N. & Simon, J.Z. Cortical entrainment to continuous speech: functional roles and interpretations. Front. Hum. Neurosci. 8, 311 (2014).
Jackendoff, R. Foundations of Language: Brain, Meaning, Grammar, Evolution (Oxford University Press, 2002).
Hagoort, P. On Broca, brain, and binding: a new framework. Trends Cogn. Sci. 9, 416–423 (2005).
Cutler, A., Dahan, D. & van Donselaar, W. Prosody in the comprehension of spoken language: a literature review. Lang. Speech 40, 141–201 (1997).
Frazier, L., Carlson, K. & Clifton, C. Jr. Prosodic phrasing is central to language comprehension. Trends Cogn. Sci. 10, 244–249 (2006).
Singer, W. & Gray, C.M. Visual feature integration and the temporal correlation hypothesis. Annu. Rev. Neurosci. 18, 555–586 (1995).
Friederici, A.D. Towards a neural basis of auditory sentence processing. Trends Cogn. Sci. 6, 78–84 (2002).
Kutas, M. & Federmeier, K.D. Electrophysiology reveals semantic memory use in language comprehension. Trends Cogn. Sci. 4, 463–470 (2000).
Neville, H., Nicol, J.L., Barss, A., Forster, K.I. & Garrett, M.F. Syntactically based sentence processing classes: evidence from event-related brain potentials. J. Cogn. Neurosci. 3, 151–165 (1991).
Lau, E.F., Phillips, C. & Poeppel, D. A cortical network for semantics: (de)constructing the N400. Nat. Rev. Neurosci. 9, 920–933 (2008).
Halgren, E. et al. N400-like magnetoencephalography responses modulated by semantic context, word frequency and lexical class in sentences. Neuroimage 17, 1101–1116 (2002).
Van Petten, C. & Kutas, M. Interactions between sentence context and word frequency in event-related brain potentials. Mem. Cognit. 18, 380–393 (1990).
O'Connell, R.G., Dockree, P.M. & Kelly, S.P. A supramodal accumulation-to-bound signal that determines perceptual decisions in humans. Nat. Neurosci. 15, 1729–1735 (2012).
Koechlin, E., Ody, C. & Kouneiher, F. The architecture of cognitive control in the human prefrontal cortex. Science 302, 1181–1185 (2003).
Nozaradan, S., Peretz, I., Missal, M. & Mouraux, A. Tagging the neuronal entrainment to beat and meter. J. Neurosci. 31, 10234–10240 (2011).
Oldfield, R.C. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9, 97–113 (1971).
de Cheveigné, A. & Simon, J.Z. Denoising based on time-shift PCA. J. Neurosci. Methods 165, 297–305 (2007).
de Cheveigné, A. & Simon, J.Z. Denoising based on spatial filtering. J. Neurosci. Methods 171, 331–339 (2008).
Ding, N. & Simon, J.Z. Adaptive temporal encoding leads to a background-insensitive cortical representation of speech. J. Neurosci. 33, 5728–5735 (2013).
Ding, N. & Simon, J.Z. Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. J. Neurophysiol. 107, 78–89 (2012).
Wang, Y. et al. Sensitivity to temporal modulation rate and spectral bandwidth in the human auditory system: MEG evidence. J. Neurophysiol. 107, 2033–2041 (2012).
Efron, B. & Tibshirani, R. An Introduction to the Bootstrap (CRC press, 1993).
Yang, A.I. et al. Localization of dense intracranial electrode arrays using magnetic resonance imaging. Neuroimage 63, 157–165 (2012).
We thank J. Walker for MEG technical support, T. Thesen, W. Doyle and O. Devinsky for their instrumental help in collecting ECoG data, and G. Buzsaki, G. Cogan, S. Dehaene, A.-L. Giraud, G. Hickok, N. Hornstein, E. Lau, A. Marantz, N. Mesgarani, M. Peña, B. Pesaran, L. Pylkkänen, C. Schroeder, J. Simon and W. Singer for their comments on previous versions of the manuscript. This work supported by US National Institutes of Health grant 2R01DC05660 (D.P.) and Major Projects Program of the Shanghai Municipal Science and Technology Commission (STCSM) 15JC1400104 (X.T.) and National Natural Science Foundation of China 31500914 (X.T.).
The authors declare no competing financial interests.
Integrated supplementary information
(A) For 4-syllable sentences, in each trial, 10 sentences are presented sequentially without any acoustic gap between them. English examples are given below the Chinese sentences/phrases to illustrate their syntactic structures (not direct translations). The same trial structure applies for 4-syllable verb phrases, except that each 4-syllable sentence (bounded by the dashed red box) is replaced by a 4-syllable type I verb phrase (B) or type II verb phrase (C). (D) For 2-syllable phrases, 20 phrases are presented sequentially in each trial. (E) Grammar for the constant predictability Markovian language. (F) The trial structure of Markovian language stimulus.
Supplementary Figure 2 The spectrum of the temporal envelope for the Chinese (A) and English (B) 4-syllable sentence stimuli.
The power spectrum is averaged over all stimulus trials, and the SEM across trials is shown (shaded area). A spectral peak is seen at the syllabic rate but not at the phrasal or sentential rates, confirming that the sentential and phrasal structure is not conveyed by acoustic power cues. The stimulus envelope is the half-wave rectified sound waveform. The two conditions shown for each language are not significantly different from each other (P > 0.15, FDR corrected).
Supplementary Figure 3 Comparisons between the responses to stimuli of different linguistic structures.
The tree diagrams at the top illustrate the four linguistic structures tested. All of them are constructed using an isochronous syllable sequence at 4 Hz. For Structure I, syllables or backward syllables are presented in a random order, not grouped into larger linguistic structures. For Structure II, every two syllables combine into a phrase, which activates a phrasal rhythm at 2Hz in addition to the 4-Hz syllabic rhythm. For Structure III, a 4-syllable verb phrase is constructed using a monosyllablic verb followed by a 3-syllable noun phrase. The 4-syllable verb phrase is frequency-tagged at 1 Hz but no linguistic structure is uniquely tagged at 2 Hz. For Structure IV, a 4-syllable structure evenly divides into two 2-syllable structures. The binary hierarchical embedding results in three levels of linguistic structures tagged at 1 Hz, 2 Hz, and 4 Hz, respectively. (A) For Chinese listeners (dark red bars), the 1-Hz response is significantly stronger for stimuli containing a 4-syllable constituent structure (yellow box). For English listeners who cannot parse the linguistic structure (blue bars), however, the response is not significantly different between conditions. All significant differences between conditions are shown and a thick gray bar indicates significant differences between two groups such that each condition in one group is significantly different from any condition in the other group (P < 0.03, t-test, FDR corrected). (B) The response at 2 Hz is stronger for stimuli containing 2-syllable phrasal structures (dashed green box) for Chinese listeners, but not so for English listeners. (C) A 4 Hz response, at the syllabic rate is seen in all tested conditions and both listener groups, but weaker for backward syllables than normal syllables.
Supplementary Figure 4 Dissociating neural encoding of sentential structures and transitional probability using Artifical Markovian Sentences (AMS).
(A) Grammar of the AMS. Each AMS consisted of 3 components, and each syllable was independently chosen from 3 candidate syllables with equal probability. In each trial, 33 sentences were played in a sequence without any gap in between them. (B) Procedures of the AMS experiment. The experiment has two sessions. In the first session (upper row), stimuli from each set of the AMS were played in separate blocks, before the listeners were instructed about the grammar of the AMS. In the second session, the 5 sets of AMS were learned in separate blocks. In the training phase of each block (labeled by T), the listeners listened to sentences from the AMS set and these sentences were separated by a 300 ms gap. After the training phase, the listeners listened to the same stimuli they heard in the first session. At the end of the block, the listeners had to report the grammar of the AMS set. (C) Neural response spectrum before (left) and after training (right). Before the listeners learn the grammar of the AMS, cortical activity only tracks the syllabic rhythm of speech. After learning, however, cortical activity concurrently follows the syllabic rhythm and the sentential rhythm. Since each trial (excluding the first sentence) is 53.1 seconds in duration, the frequency resolution of the spectrum is 0.019 Hz. Frequency bins showing power stronger than the mean power of a neighboring 1 Hz region (i.e., 0.5 Hz on each side) are shown by stars (N = 5, P < 0.001, paired t-test, FDR corrected).
Color differentiates the 5 participants.
About this article
Cite this article
Ding, N., Melloni, L., Zhang, H. et al. Cortical tracking of hierarchical linguistic structures in connected speech. Nat Neurosci 19, 158–164 (2016). https://doi.org/10.1038/nn.4186
Hierarchical Structure in Sequence Processing: How to Measure It and Determine Its Neural Implementation
Topics in Cognitive Science (2020)
Transcranial Alternating Current Stimulation With the Theta-Band Portion of the Temporally-Aligned Speech Envelope Improves Speech-in-Noise Comprehension
Frontiers in Human Neuroscience (2020)
PLOS Biology (2020)
Electrocorticography reveals continuous auditory and visual speech tracking in temporal and occipital cortex
European Journal of Neuroscience (2020)
Electrophysiology of statistical learning: Exploring the online learning process and offline learning product
European Journal of Neuroscience (2020)