Word contexts enhance the neural representation of individual letters in early visual cortex

Visual context facilitates perception, but how this is neurally implemented remains unclear. One example of contextual facilitation is found in reading, where letters are more easily identified when embedded in a word. Bottom-up models explain this word advantage as a post-perceptual decision bias, while top-down models propose that word contexts enhance perception itself. Here, we arbitrate between these accounts by presenting words and nonwords and probing the representational fidelity of individual letters using functional magnetic resonance imaging. In line with top-down models, we find that word contexts enhance letter representations in early visual cortex. Moreover, we observe increased coupling between letter information in visual cortex and brain activity in key areas of the reading network, suggesting these areas may be the source of the enhancement. Our results provide evidence for top-down representational enhancement in word recognition, demonstrating that word contexts can modulate perceptual processing already at the earliest visual regions.

2 Supplementary Figure 1 Behavioural results. To make sure participants kept reading and were equally attentive of words and nonwords, they performed a challenging orthographic discrimination task. The task was performed on specific, learned targets that were presented about once per trial at an unpredictable moment. Targets were learned during a separate training session and were presented either in their regular (learned) form or with one of the non-middle letters permuted.
Whenever a target was presented participants had to report whether it was correctly 'spelled'.
Participants were faster (Wilcoxon signed rank, T=40, p=1.07×10 -5 , r = 0.87) but not statistically significantly more accurate (two-tailed t-test, t34 = 1.70, p = 0.098, d = 0.29) for word compared to nonword targets. This is in line with word superiority, although the perceptual nature of this advantage cannot be established from behavioural results on this task alone as there might also be memory or decisional factors contributing to the observed facilitation. Grey dots with connecting lines are individual participants. Colours are estimated densities, white dots are group medians, boxes are quartiles and whiskers are 1.5 interquartile range. Significance stars indicate p<0.001 (***) in a (paired) two-tailed Wilcoxon sign rank test.

Supplementary Figure 2 Simulated letter identification accuracies. All simulation parameters
were identical to the simulation of Figure 3a, except that median predicted response accuracy, rather than representational strength, for the middle letter was computed (see Methods). The fact that the accuracies are virtually at 100% in all conditions shows that stimuli were, despite the visual noise, clearly 'visible' to the network (note that chance level would be 3.84% or 1/26). This reflects a key difference between our paradigm -in which stimuli were presented well-above thresholdand the majority of studies in the literature -where stimuli are presented near-threshold. These results confirm that even when the critical letter is clearly visible and predicted letter identification responses are virtually at 100%, theoretical models still predict that enhancement of representations can occur. The accuracy values here might appear in conflict with the accuracies in Supplementary Figure 1. Note however that in the behavioural task, performance did not purely rely on perception of letters but also on their comparison to a memory template, and that the task was performed on the outer letters while participants maintained fixation at the centre of the screen.
The middle letter was therefore always well-identifiable, making the predicted near-perfect accuracies a reasonable approximation of experimental viewing conditions. Horizontal eye movements were quantified for each trial and then averaged for both conditions and compared within participants. Grey dots and connecting lines represent single participants, white dots group medians, boxes and whiskers represent quartiles and 1.5 interquartile ranges.
No statistically significant difference between conditions was found (paired t-test, t32=-1.43 P=0.16). Two participants were not included because there was no eye tracking data of sufficient quality.

Supplementary Note 1 Spatial and retinotopic specificity
If the letter information extracted from visual cortex, and its enhancement by word contexts, indeed reflect sensory representations, then the MVPA results should be retinotopically specific. If, on the other hand, letter identity could be decoded from voxels throughout much of the brain, or if the enhancement was not retinotopically specific (e.g. reflecting a more general increase in signal-to-noise ratio) it would be more difficult to conclude that the MVPA results reflect sensory representations. We therefore tested for spatial specificity by running a searchlight version of the classification and pattern correlation analyses. Supplementary Therefore, we ran a more sensitive ROI analysis in native EPI space. Here, we use the resulting searchlight maps (containing classification and pattern correlation results for each voxel in a participant's native EPI space). We compared the classification in the central ROI (using the functional definition described earlier) to a functionally defined peripheral ROI.
Voxels were deemed peripheral when they showed a strong response to stimuli in the main experiment (which spanned a large part of the visual field), but showed a weak or no response to stimuli in the localiser (which were presented near fixation). For this analysis we focused on V1, because it has the strongest retinotopy. Indeed, as can be seen in Supplementary  Figure 9, overall letter decoding was greatly reduced for the peripheral ROI compared to the central ROI, both for the classification analysis (paired t-test, t34=15.59, p = 8.86 × 10 -17 , d = 2.67) and pattern correlation analysis (paired t-test, t34=8.06, p = 2.65 × 10 -9 , d = 1.38).
Critically, a similar reduction in the peripheral ROI was found for the enhancement effect (the difference in decoding between conditions), again both for the classification analysis (paired t-test, t34=2.56, p = 0.015, d = 0.44) and pattern correlation analysis (paired t-test, t34=2.92, p = 6.31 × 10 -3 , d = 0.50). Importantly, although we initially (Supplementary Figure 9) focussed on V1 -because it has the strongest retinotopy and because it was requested by the reviewer -a similar reduction was observed for our main ROI of interest, early visual cortex (i.e. the conjunction of V1 and V2). Specifically, here too we found greatly reduced overall letter decoding, both for the classification analysis (paired t-test, t34=18.49, p = 5.52 × 10 -19 , d = 3.17) and pattern correlation analysis (paired t-test, t34=8.86, p = 3.02 × 10 -10 , d = 1.52).