Cortical tracking of hierarchical linguistic structures in connected speech

Abstract

The most critical attribute of human language is its unbounded combinatorial nature: smaller elements can be combined into larger structures on the basis of a grammatical system, resulting in a hierarchy of linguistic units, such as words, phrases and sentences. Mentally parsing and representing such structures, however, poses challenges for speech comprehension. In speech, hierarchical linguistic structures do not have boundaries that are clearly defined by acoustic cues and must therefore be internally and incrementally constructed during comprehension. We found that, during listening to connected speech, cortical activity of different timescales concurrently tracked the time course of abstract linguistic structures at different hierarchical levels, such as words, phrases and sentences. Notably, the neural tracking of hierarchical linguistic structures was dissociated from the encoding of acoustic cues and from the predictability of incoming words. Our results indicate that a hierarchy of neural processing timescales underlies grammar-based internal construction of hierarchical linguistic structure.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Neural tracking of hierarchical linguistic structures.
Figure 2: Tracking of different linguistic structures.
Figure 3: Dissociating sentential structures and transitional probability.
Figure 4: Neural tracking of sentences of varying structures.
Figure 5: Localizing cortical sources of the sentential and phrasal rate responses using ECoG (N = 5).
Figure 6: Spatial dissociation between sentential-rate, phrasal-rate and syllabic-rate responses (N = 5).
Figure 7: Syllabic-rate ECoG responses to English sentences and the acoustic control (N = 5).

References

  1. 1

    Berwick, R.C., Friederici, A.D., Chomsky, N. & Bolhuis, J.J. Evolution, brain, and the nature of language. Trends Cogn. Sci. 17, 89–98 (2013).

    PubMed  Article  PubMed Central  Google Scholar 

  2. 2

    Chomsky, N. Syntactic Structures (Mouton de Gruyter, 1957).

  3. 3

    Phillips, C. Linear order and constituency. Linguist. Inq. 34, 37–90 (2003).

    Article  Google Scholar 

  4. 4

    Bemis, D.K. & Pylkkänen, L. Basic linguistic composition recruits the left anterior temporal lobe and left angular gyrus during both listening and reading. Cereb. Cortex 23, 1859–1873 (2013).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  5. 5

    Giraud, A.-L. & Poeppel, D. Cortical oscillations and speech processing: emerging computational principles and operations. Nat. Neurosci. 15, 511–517 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6

    Sanders, L.D., Newport, E.L. & Neville, H.J. Segmenting nonsense: an event-related potential index of perceived onsets in continuous speech. Nat. Neurosci. 5, 700–703 (2002).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7

    Bastiaansen, M., Magyari, L. & Hagoort, P. Syntactic unification operations are reflected in oscillatory dynamics during on-line sentence comprehension. J. Cogn. Neurosci. 22, 1333–1347 (2010).

    PubMed  Article  PubMed Central  Google Scholar 

  8. 8

    Buiatti, M., Peña, M. & Dehaene-Lambertz, G. Investigating the neural correlates of continuous speech computation with frequency-tagged neuroelectric responses. Neuroimage 44, 509–519 (2009).

    PubMed  Article  PubMed Central  Google Scholar 

  9. 9

    Pallier, C., Devauchelle, A.-D. & Dehaene, S. Cortical representation of the constituent structure of sentences. Proc. Natl. Acad. Sci. USA 108, 2522–2527 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. 10

    Schroeder, C.E., Lakatos, P., Kajikawa, Y., Partan, S. & Puce, A. Neuronal oscillations and visual amplification of speech. Trends Cogn. Sci. 12, 106–113 (2008).

    PubMed  PubMed Central  Article  Google Scholar 

  11. 11

    Buzsáki, G. Neural syntax: cell assemblies, synapsembles and readers. Neuron 68, 362–385 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  12. 12

    Bernacchia, A., Seo, H., Lee, D. & Wang, X.-J. A reservoir of time constants for memory traces in cortical neurons. Nat. Neurosci. 14, 366–372 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13

    Lerner, Y., Honey, C.J., Silbert, L.J. & Hasson, U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14

    Kiebel, S.J., Daunizeau, J. & Friston, K.J. A hierarchy of time-scales and the brain. PLoS Comput. Biol. 4, e1000209 (2008).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  15. 15

    Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 1001–1010 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16

    Ding, N. & Simon, J.Z. Emergence of neural encoding of auditory objects while listening to competing speakers. Proc. Natl. Acad. Sci. USA 109, 11854–11859 (2012).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  17. 17

    Zion Golumbic, E.M. et al. Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron 77, 980–991 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18

    Peelle, J.E., Gross, J. & Davis, M.H. Phase-locked responses to speech in human auditory cortex are enhanced during comprehension. Cereb. Cortex 23, 1378–1387 (2013).

    PubMed  Article  PubMed Central  Google Scholar 

  19. 19

    Pasley, B.N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20

    Steinhauer, K., Alter, K. & Friederici, A.D. Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nat. Neurosci. 2, 191–196 (1999).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  21. 21

    Peña, M., Bonatti, L.L., Nespor, M. & Mehler, J. Signal-driven computations in speech processing. Science 298, 604–607 (2002).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  22. 22

    Saffran, J.R., Aslin, R.N. & Newport, E.L. Statistical learning by 8-month-old infants. Science 274, 1926–1928 (1996).

    CAS  Article  Google Scholar 

  23. 23

    Ray, S. & Maunsell, J.H. Different origins of gamma rhythm and high-gamma activity in macaque visual cortex. PLoS Biol. 9, e1000610 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. 24

    Einevoll, G.T., Kayser, C., Logothetis, N.K. & Panzeri, S. Modeling and analysis of local field potentials for studying the function of cortical circuits. Nat. Rev. Neurosci. 14, 770–785 (2013).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  25. 25

    Hagoort, P. & Indefrey, P. The neurobiology of language beyond single words. Annu. Rev. Neurosci. 37, 347–362 (2014).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  26. 26

    Grodzinsky, Y. & Friederici, A.D. Neuroimaging of syntax and syntactic processing. Curr. Opin. Neurobiol. 16, 240–246 (2006).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  27. 27

    Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28

    Friederici, A.D., Meyer, M. & von Cramon, D.Y. Auditory language comprehension: an event-related fMRI study on the processing of syntactic and lexical information. Brain Lang. 74, 289–300 (2000).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  29. 29

    Canolty, R.T. et al. High gamma power is phase-locked to theta oscillations in human neocortex. Science 313, 1626–1628 (2006).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. 30

    Lakatos, P. et al. An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. J. Neurophysiol. 94, 1904–1911 (2005).

    PubMed  PubMed Central  Article  Google Scholar 

  31. 31

    Sirota, A., Csicsvari, J., Buhl, D. & Buzsáki, G. Communication between neocortex and hippocampus during sleep in rodents. Proc. Natl. Acad. Sci. USA 100, 2065–2069 (2003).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32

    Arnal, L.H. & Giraud, A.-L. Cortical oscillations and sensory predictions. Trends Cogn. Sci. 16, 390–398 (2012).

    PubMed  Article  PubMed Central  Google Scholar 

  33. 33

    Poeppel, D., Idsardi, W.J. & van Wassenhove, V. Speech perception at the interface of neurobiology and linguistics. Phil. Trans. R. Soc. Lond. B 363, 1071–1086 (2008).

    Article  Google Scholar 

  34. 34

    Peña, M. & Melloni, L. Brain oscillations during spoken sentence processing. J. Cogn. Neurosci. 24, 1149–1164 (2012).

    PubMed  Article  PubMed Central  Google Scholar 

  35. 35

    Gross, J. et al. Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS Biol. 11, e1001752 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  36. 36

    Ding, N. & Simon, J.Z. Cortical entrainment to continuous speech: functional roles and interpretations. Front. Hum. Neurosci. 8, 311 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  37. 37

    Jackendoff, R. Foundations of Language: Brain, Meaning, Grammar, Evolution (Oxford University Press, 2002).

  38. 38

    Hagoort, P. On Broca, brain, and binding: a new framework. Trends Cogn. Sci. 9, 416–423 (2005).

    PubMed  Article  PubMed Central  Google Scholar 

  39. 39

    Cutler, A., Dahan, D. & van Donselaar, W. Prosody in the comprehension of spoken language: a literature review. Lang. Speech 40, 141–201 (1997).

    PubMed  Article  PubMed Central  Google Scholar 

  40. 40

    Frazier, L., Carlson, K. & Clifton, C. Jr. Prosodic phrasing is central to language comprehension. Trends Cogn. Sci. 10, 244–249 (2006).

    PubMed  Article  PubMed Central  Google Scholar 

  41. 41

    Singer, W. & Gray, C.M. Visual feature integration and the temporal correlation hypothesis. Annu. Rev. Neurosci. 18, 555–586 (1995).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  42. 42

    Friederici, A.D. Towards a neural basis of auditory sentence processing. Trends Cogn. Sci. 6, 78–84 (2002).

    PubMed  Article  PubMed Central  Google Scholar 

  43. 43

    Kutas, M. & Federmeier, K.D. Electrophysiology reveals semantic memory use in language comprehension. Trends Cogn. Sci. 4, 463–470 (2000).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  44. 44

    Neville, H., Nicol, J.L., Barss, A., Forster, K.I. & Garrett, M.F. Syntactically based sentence processing classes: evidence from event-related brain potentials. J. Cogn. Neurosci. 3, 151–165 (1991).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  45. 45

    Lau, E.F., Phillips, C. & Poeppel, D. A cortical network for semantics: (de)constructing the N400. Nat. Rev. Neurosci. 9, 920–933 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. 46

    Halgren, E. et al. N400-like magnetoencephalography responses modulated by semantic context, word frequency and lexical class in sentences. Neuroimage 17, 1101–1116 (2002).

    PubMed  Article  PubMed Central  Google Scholar 

  47. 47

    Van Petten, C. & Kutas, M. Interactions between sentence context and word frequency in event-related brain potentials. Mem. Cognit. 18, 380–393 (1990).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  48. 48

    O'Connell, R.G., Dockree, P.M. & Kelly, S.P. A supramodal accumulation-to-bound signal that determines perceptual decisions in humans. Nat. Neurosci. 15, 1729–1735 (2012).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  49. 49

    Koechlin, E., Ody, C. & Kouneiher, F. The architecture of cognitive control in the human prefrontal cortex. Science 302, 1181–1185 (2003).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. 50

    Nozaradan, S., Peretz, I., Missal, M. & Mouraux, A. Tagging the neuronal entrainment to beat and meter. J. Neurosci. 31, 10234–10240 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. 51

    Oldfield, R.C. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9, 97–113 (1971).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  52. 52

    de Cheveigné, A. & Simon, J.Z. Denoising based on time-shift PCA. J. Neurosci. Methods 165, 297–305 (2007).

    PubMed  PubMed Central  Article  Google Scholar 

  53. 53

    de Cheveigné, A. & Simon, J.Z. Denoising based on spatial filtering. J. Neurosci. Methods 171, 331–339 (2008).

    PubMed  Article  PubMed Central  Google Scholar 

  54. 54

    Ding, N. & Simon, J.Z. Adaptive temporal encoding leads to a background-insensitive cortical representation of speech. J. Neurosci. 33, 5728–5735 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. 55

    Ding, N. & Simon, J.Z. Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. J. Neurophysiol. 107, 78–89 (2012).

    PubMed  Article  PubMed Central  Google Scholar 

  56. 56

    Wang, Y. et al. Sensitivity to temporal modulation rate and spectral bandwidth in the human auditory system: MEG evidence. J. Neurophysiol. 107, 2033–2041 (2012).

    PubMed  Article  PubMed Central  Google Scholar 

  57. 57

    Efron, B. & Tibshirani, R. An Introduction to the Bootstrap (CRC press, 1993).

  58. 58

    Yang, A.I. et al. Localization of dense intracranial electrode arrays using magnetic resonance imaging. Neuroimage 63, 157–165 (2012).

    PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements

We thank J. Walker for MEG technical support, T. Thesen, W. Doyle and O. Devinsky for their instrumental help in collecting ECoG data, and G. Buzsaki, G. Cogan, S. Dehaene, A.-L. Giraud, G. Hickok, N. Hornstein, E. Lau, A. Marantz, N. Mesgarani, M. Peña, B. Pesaran, L. Pylkkänen, C. Schroeder, J. Simon and W. Singer for their comments on previous versions of the manuscript. This work supported by US National Institutes of Health grant 2R01DC05660 (D.P.) and Major Projects Program of the Shanghai Municipal Science and Technology Commission (STCSM) 15JC1400104 (X.T.) and National Natural Science Foundation of China 31500914 (X.T.).

Author information

Affiliations

Authors

Contributions

N.D., L.M. and D.P. conceived and designed the experiments. N.D., H.Z. and X.T. performed the MEG experiments. L.M. performed the ECoG experiment. N.D., L.M. and D.P. wrote the paper. All of the authors discussed the results and edited the manuscript.

Corresponding authors

Correspondence to Nai Ding or David Poeppel.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Trial structure of Chinese (A-D) and English (EF) speech materials.

(A) For 4-syllable sentences, in each trial, 10 sentences are presented sequentially without any acoustic gap between them. English examples are given below the Chinese sentences/phrases to illustrate their syntactic structures (not direct translations). The same trial structure applies for 4-syllable verb phrases, except that each 4-syllable sentence (bounded by the dashed red box) is replaced by a 4-syllable type I verb phrase (B) or type II verb phrase (C). (D) For 2-syllable phrases, 20 phrases are presented sequentially in each trial. (E) Grammar for the constant predictability Markovian language. (F) The trial structure of Markovian language stimulus.

Supplementary Figure 2 The spectrum of the temporal envelope for the Chinese (A) and English (B) 4-syllable sentence stimuli.

The power spectrum is averaged over all stimulus trials, and the SEM across trials is shown (shaded area). A spectral peak is seen at the syllabic rate but not at the phrasal or sentential rates, confirming that the sentential and phrasal structure is not conveyed by acoustic power cues. The stimulus envelope is the half-wave rectified sound waveform. The two conditions shown for each language are not significantly different from each other (P > 0.15, FDR corrected).

Supplementary Figure 3 Comparisons between the responses to stimuli of different linguistic structures.

The tree diagrams at the top illustrate the four linguistic structures tested. All of them are constructed using an isochronous syllable sequence at 4 Hz. For Structure I, syllables or backward syllables are presented in a random order, not grouped into larger linguistic structures. For Structure II, every two syllables combine into a phrase, which activates a phrasal rhythm at 2Hz in addition to the 4-Hz syllabic rhythm. For Structure III, a 4-syllable verb phrase is constructed using a monosyllablic verb followed by a 3-syllable noun phrase. The 4-syllable verb phrase is frequency-tagged at 1 Hz but no linguistic structure is uniquely tagged at 2 Hz. For Structure IV, a 4-syllable structure evenly divides into two 2-syllable structures. The binary hierarchical embedding results in three levels of linguistic structures tagged at 1 Hz, 2 Hz, and 4 Hz, respectively. (A) For Chinese listeners (dark red bars), the 1-Hz response is significantly stronger for stimuli containing a 4-syllable constituent structure (yellow box). For English listeners who cannot parse the linguistic structure (blue bars), however, the response is not significantly different between conditions. All significant differences between conditions are shown and a thick gray bar indicates significant differences between two groups such that each condition in one group is significantly different from any condition in the other group (P < 0.03, t-test, FDR corrected). (B) The response at 2 Hz is stronger for stimuli containing 2-syllable phrasal structures (dashed green box) for Chinese listeners, but not so for English listeners. (C) A 4 Hz response, at the syllabic rate is seen in all tested conditions and both listener groups, but weaker for backward syllables than normal syllables.

Supplementary Figure 4 Dissociating neural encoding of sentential structures and transitional probability using Artifical Markovian Sentences (AMS).

(A) Grammar of the AMS. Each AMS consisted of 3 components, and each syllable was independently chosen from 3 candidate syllables with equal probability. In each trial, 33 sentences were played in a sequence without any gap in between them. (B) Procedures of the AMS experiment. The experiment has two sessions. In the first session (upper row), stimuli from each set of the AMS were played in separate blocks, before the listeners were instructed about the grammar of the AMS. In the second session, the 5 sets of AMS were learned in separate blocks. In the training phase of each block (labeled by T), the listeners listened to sentences from the AMS set and these sentences were separated by a 300 ms gap. After the training phase, the listeners listened to the same stimuli they heard in the first session. At the end of the block, the listeners had to report the grammar of the AMS set. (C) Neural response spectrum before (left) and after training (right). Before the listeners learn the grammar of the AMS, cortical activity only tracks the syllabic rhythm of speech. After learning, however, cortical activity concurrently follows the syllabic rhythm and the sentential rhythm. Since each trial (excluding the first sentence) is 53.1 seconds in duration, the frequency resolution of the spectrum is 0.019 Hz. Frequency bins showing power stronger than the mean power of a neighboring 1 Hz region (i.e., 0.5 Hz on each side) are shown by stars (N = 5, P < 0.001, paired t-test, FDR corrected).

Supplementary Figure 5 Coverage of the ECoG electrodes.

Color differentiates the 5 participants.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 and Supplementary Tables 1 and 2 (PDF 1431 kb)

Supplementary Methods Checklist (PDF 384 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ding, N., Melloni, L., Zhang, H. et al. Cortical tracking of hierarchical linguistic structures in connected speech. Nat Neurosci 19, 158–164 (2016). https://doi.org/10.1038/nn.4186

Download citation

Further reading