Evidence for causal top-down frontal contributions to predictive processes in speech perception

Perception relies on the integration of sensory information and prior expectations. Here we show that selective neurodegeneration of human frontal speech regions results in delayed reconciliation of predictions in temporal cortex. These temporal regions were not atrophic, displayed normal evoked magnetic and electrical power, and preserved neural sensitivity to manipulations of sensory detail. Frontal neurodegeneration does not prevent the perceptual effects of contextual information; instead, prior expectations are applied inflexibly. The precision of predictions correlates with beta power, in line with theoretical models of the neural instantiation of predictive coding. Fronto-temporal interactions are enhanced while participants reconcile prior predictions with degraded sensory signals. Excessively precise predictions can explain several challenging phenomena in frontal aphasias, including agrammatism and subjective difficulties with speech perception. This work demonstrates that higher-level frontal mechanisms for cognitive and behavioural flexibility make a causal functional contribution to the hierarchical generative models underlying speech perception.

product of the sensory input and prior precision, with the weighting determined by the congruency of the written cue. Bottom right: the modelled clarity rating, calculated as the height of the posterior above perceptual threshold, individually normalised to a 1-4 rating scale.
Supplementary Figure 3: Root mean square averaged evoked response across all sensors for each modality, after averaging across conditions and participants. The time windows over which the evoked brain sources were reconstructed are depicted by the areas shaded in gray (90-150ms, 200-280ms, 290-440ms, 450-700ms). Scalp topographic plots display the average evoked response within each window.

Supplementary Figure 4:
A: Scalp topologies for the group by congruency interaction in the beta frequency band. B: Scalp topologies for the group by congruency interaction in the alpha frequency band. C: eLORETA source reconstructions for 300-450ms, 450-600ms and 600-750ms, corresponding to time windows displaying a greater effect of congruency in controls, no group by congruency interaction, and a greater effect of congruency in nfvPPA. Only left hemispheres are shown, as no significant right sided sources were demonstrated. Figure 5: Basic auditory processing, after Grube et al. 22 . The tasks employed were pitch change detection, 2Hz FM detection, 40Hz FM detection, and dynamic ripple density discrimination.

Supplementary Tables
Supplementary Table 1

Basic Auditory Processing
In this larger cohort we replicated the finding of Grube, et al. 1 , namely that patients with nfvPPA perform very poorly at some tasks of basic auditory processing. This does not seem to have a trivial explanation like an inability to sustain attention or yes/no confusion, as the individual adaptive tracks have a similar shape in patients and controls (supplementary figure 5), with consistent correct responses in 'easy' trials and, once threshold is reached, flat profiles maintained for the remainder of a run. The pattern of performance was highly variable between individuals, but highly consistent within individuals; as can be seen in the individual Z-scores for each task, some patients were able to consistently perform some discrimination tasks in the normal range, while being dozens of standard deviations poorer than the mean in other tasks. Patients who performed well on a particular task continued to perform well if it was repeated but, even after repeated practice, they remained unable to perform well on tasks that they had previously found difficult.

Supplementary Discussion
Basic Auditory Processing in nfvPPA In our larger cohort, we confirmed the previous suggestion that deficits in auditory processing are over-represented in patients with nfvPPA (supplementary figure 5) 1,2 . On the face of it, this seems a surprising finding as the Bayesian VBM provides evidence for no atrophy in primary auditory regions ( figure 1E), and at post mortem patients with nfvPPA do not display disproportionate pathology in either primary auditory cortex or auditory brainstem nuclei. Further, it is known that patients with progressive supranuclear palsy, who do have severe brain stem atrophy, continue to display complex auditory psychophysical effects late in disease 3 . Similarly, our patients with nfvPPA demonstrated good performance at identifying vocoded words, performing almost as well as controls ( figure 2D). Finally, the pattern of psychophysical deficits was observed to be highly variable (supplementary figure 5); impairment of a bottom-up perceptual process would predict a consistent profile of performance that might vary in severity, while what we observe is that all individuals perform very poorly on some tasks, but the relative difficulty of the tasks varies between individuals. A higher level, cortical explanation must therefore be invoked to account for these psychophysical findings. This effect has previously been understood in terms of impaired working memory 1 , but it is worth considering whether it might also be explained by the abnormalities of predictive coding demonstrated here. Basic auditory processing is traditionally assessed with two-or three-alternative forced choice paradigms, with the difference between exemplars adaptively modified to track a given performance percentile 4 . A predictive coding model can also be applied to this experimental context, in which subjects make a decision based on the location of the peak of their posterior in the perceptual dimension of interest. The distribution of this posterior is based on a prediction that is modified by prediction error induced by sensory input. For example, in a task where one is asked to detect the presence of a pitch change the subject might listen to the first pitch and then set up a prediction that the second pitch would be unchanged. A decision would be made by performing two-point discrimination on the peak locations of prior and posterior distributions. If all subjects establish similar predictions, the accuracy of this decision process is dependent only on the precision of the sensory input. If, however, subjects with nfvPPA make more precise predictions, the perceptual distance between prior and posterior would be reduced, leading to poor discriminatory performance even though the sensory input is unchanged. This also explains the lack of a hierarchical relationship in performance profiles (supplementary figure 5); for example to discriminate the density of dynamic ripples it is necessary to be able to process frequency modulations, and yet some patients were able to perform within the normal range at discriminating ripples whilst seemingly unable to detect its building blocks (i.e. frequency modulation). This explanation in terms of abnormally precise predictions is not exclusive of that in terms of impaired working memory (which could be modelled here as a drift in the location of the prior distribution over time), and indeed both processes could be occurring simultaneously. The argument we make is simply that it is possible that impaired predictive coding, of itself, is a sufficient explanation for measured impairments in basic sound discrimination in nfvPPA.

Differences in implicit learning
To address the question of whether our findings can be accounted for by differences between groups in implicit learning, we exploited the fact that the repetition of experiment 1, with the addition of neutral primes, was always performed after the MEG session and after experiment 2. While mismatch clarity ratings in nfvPPA for 8 and 16 channel speech were very slightly higher at this repetition than during the MEG session, which we speculate may reflect a combination of implicit learning 5 and a decrease in the likelihood of prior congruency from 50% to 33%, the large group by congruency interaction remained. This implies that perceptual learning of degraded speech across the experiment cannot account for the difference in patients between low mismatch and neutral clarity ratings in experiment 1 and good vocoded word report in experiment 2.