We are pleased that our Perspective (Kazanina, N. & Tavano, A. What neural oscillations can and cannot do for syntactic structure building. Nat. Rev. Neurosci. 24, 113–128; 2023)1 led to a commentary by Ding (Ding, N. Low-frequency neural parsing of hierarchical linguistic structures. Nat. Rev. Neurosci. https://doi.org/10.1038/s41583-023-00749-y; 2023)2, who authored a key publication on the topic3.

Ding appears to agree with our claim that oscillations-for-chunking (called ‘multiscale envelope tracking’ in ref. 2) are not suitable for hierarchical structure building. Ding describes an alternative approach — termed the ‘hierarchical structure building (HSB) hypothesis’ — that is considered to be fit for purpose; in particular, it accounts for the findings in ref. 3. The HSB hypothesis states that “when a new word is added to the structure or the unit closes, there is a corresponding change in the activity of the neural population”2. We note that with respect to ref. 3, this explanation is remarkably similar to our interpretation of their findings in terms of evoked responses (see page 119 of ref. 1, where we write that, “Owing to the nature of the linguistic stimuli, the spectral peaks at 2 Hz and 1 Hz may simply reflect an evoked response corresponding to the parser’s regular building of phrases and sentences and clearing them out of working memory.”). We also note that, in its current state, the HSB hypothesis correlates some neural measure with attributes of a hierarchical syntactic structure and thus concerns the outcome of syntactic structure construction. For the HSB hypothesis to genuinely serve as an account of how syntactic structure is encoded at the neural level, it needs to provide an explicit account of how a hierarchical syntactic structure can be reconstructed from the neural activity profile.

Ding writes that “there is no obvious reason to restrict the response frequency to be above 1 Hz” (referring to the frequencies that ‘tag’ phrases or sentences). However, in our Perspective, we did not suggest such a restriction. We are well aware of and cited research using a frequency-tagging paradigm that found spectral peaks below 1 Hz (refs. 3,4,5,6); for example, a study by one of the present authors6 reported a spectral peak at 0.39 Hz.

Finally, Ding writes that our Perspective “starts from a neurophysiological concept — the delta oscillation — and analyses how the properties of delta oscillation constrain speech processing mechanisms”2. We find this characterization surprising, even setting aside a potential terminological confusion linked to the term ‘speech processing’, which conventionally encompasses the decoding of auditory input into linguistic units such as phonemes, syllables or words, rather than syntactic structure building, a higher-level process shared in both oral (speech) and written language processing. Our Perspective centres on an essential functional requirement — a hierarchical syntactic structure and its intrinsic properties — and evaluates the potential of different neural mechanisms in constructing such a representation.