Prediction is Production: The missing link between language production and comprehension

Language comprehension often involves the generation of predictions. It has been hypothesized that such prediction-for-comprehension entails actual language production. Recent studies provided evidence that the production system is recruited during language comprehension, but the link between production and prediction during comprehension remains hypothetical. Here, we tested this hypothesis by comparing prediction during sentence comprehension (primary task) in participants having the production system either available or not (non-verbal versus verbal secondary task). In the primary task, sentences containing an expected or unexpected target noun-phrase were presented during electroencephalography recording. Prediction, measured as the magnitude of the N400 effect elicited by the article (expected versus unexpected), was hindered only when the production system was taxed during sentence context reading. The present study provides the first direct evidence that the availability of the speech production system is necessary for generating lexical prediction during sentence comprehension. Furthermore, these important results provide an explanation for the recruitment of language production during comprehension.

gender-marked articles; "una corona"/a crown -"un sombrero"/a hat). Such ERP modulation (i.e., larger amplitude for ERP component elicited by the unexpected relative to expected article) has been repeatedly observed and interpreted as a marker of lexical prediction, by taking advantage of gender-marked determiners in Spanish [18][19][20] , gender-inflected adjectives in Dutch 21 and phonological properties of English (indefinite article "a" changed to "an" if the following noun begins with a vowel 22,23 but see 24 for a lack of replication).
We compared three groups of participants reading highly constrained Spanish sentences containing expected versus unexpected noun-phrases (primary task). Lexical prediction effects were measured through ERP N400 modulations on the article (whose gender was congruent or not with that of the most expected target noun) and compared across the three groups differing in the secondary task. To test whether taxing the production system would reduce lexical prediction, the SP (Syllable Production) group was assigned a verbal secondary task (i.e., AS) preventing participants from using their inner speech (pronouncing the syllable/ta/once on every word display). As a control for double-tasking, the TT (Tongue-tapping) group was assigned a non-verbal secondary task similar to AS but without requiring verbalization (tapping the tongue loudly once on every word). As a control for auditory feedback perception (inherently happening in the SP group), the SL group was assigned a 'Syllable Listening' secondary task (listening to own voice pronouncing/ta/on every word). If the production system is necessary to build up predictions, the N400 expectation effect elicited by the article should be reduced in the SP group relative to the control groups. As a control for proper sentence processing and lexical integration, we expected a significant N400 effect on critical nouns in the three groups.

Material and Methods
Participants. Sixty Spanish native speakers took part in the experiment. They were randomly assigned to three groups. The sample size was chosen based on previous ERP studies reporting N400 effects in sentence processing 19,20,23 . Twenty participants (9 females; age range 19-30, mean: 25 ± 3) were assigned to the 'Syllable Production' (SP) group. Twenty participants were assigned to the 'Tongue-tapping' (TT) group. Two participants were removed from analyses because of large number of artefacts in electroencephalogram recording (more than 50% trials removed after artefact rejection). The final TT group consisted of 18 participants (11 females; age range 19-30, mean: 24 ± 3). Twenty participants were assigned to the 'Syllable Listening' (SL) group. For similar reasons than in the TT group, 2 participants had to be removed from analyses, the final TT group thus consisting of 18 participants (11 females; age range 19-30, mean: 23 ± 3). The three groups were matched on age (F[2,53] = 1.48, p = 0.24). All participants were right handed, their vision was normal or corrected to normal and they did not report any reading or neurological disorder. Participants all signed an informed consent form before taking part to the study that was approved by the BCBL ethics committee. The experiment was performed in accordance with relevant guidelines and regulations. They received a payment of 10€ per hour for their participation.

Materials.
Stimuli consisted of 100 sentence contexts with two possible critical noun-phrases (article + noun): expected or unexpected (e.g., "El rey llevaba en la cabeza una corona/un sombrero antigua/antiguo" -"The king wore on his head an old crown [Fem] /hat [Masc] "; see Table 1 for other examples of sentences). In 50 sentence contexts, the expected noun was masculine ("un/el + noun" expected noun-phrase) and the unexpected noun was feminine ("una/la + noun" unexpected noun-phrase). In the other 50 sentence contexts, the expected noun was feminine and the unexpected noun was masculine (all critical nouns were inanimate). The 200 sentences were divided into two lists of 100 and each participant was presented with one list (matched across groups). Sentence contexts and critical noun-phrases were used only once per list. Each list contained 50 expected and 50 unexpected noun-phrases. There were no semantic or syntactic violations as critical noun-phrases were always semantically and syntactically correct, albeit that one was more expected than the other (see Table 1). There were no gender violations such as in "la sombrero -the [Fem] hat [Masc] " or "el corona -the [Masc] crown [Fem] ". The target noun-phrase was never in sentence final position. Across sentences, the critical article was in position 13.1 (SD 3.7; range: 6-24) and followed by 2.2 (SD 1.1) extra words (range: 1-6).
The mean cloze probability of expected and unexpected critical nouns was assessed by native speakers of Spanish (N = 20) who did not take part to the experiment. These participants were presented with sentences truncated before the critical noun-phrase and asked to complete the sentence with the first continuation that came to their mind. The cloze probability of a noun was defined as the percentage of times it was used as continuation. The mean cloze probability for expected nouns and for expected whole NPs was respectively 0.86 (SD 0.09; range 0.6-1), and 0.84 (SD 0.10; range 0.4-1); the mean cloze probability for unexpected words and unexpected NPs was 0.00 (SD 0.01; range 0.0-0.05), and 0.00 (SD 0.01; range 0.0-0.05). Expected nouns (and NPs) had larger cloze probabilities than unexpected nouns (and NPs; all ps < 0.001).
Within each list, expected and unexpected target nouns were matched (based on EsPal database 25 ) for grammatical gender, word frequency, number of letters, number of neighbors, number of syllables, familiarity, imageability, concreteness, averaged position of the critical article in the sentence, and averaged number of words following the critical NP (see Table 2). Expected and unexpected target nouns only differed in cloze probability. Critical words were also balanced across lists on all the critical variables ( Table 2).
Experimental design. The EEG experiment was run in a soundproof electrically shielded chamber.
Participants were seated in a chair, about sixty centimeters in front of a computer screen. Stimuli were delivered with the Presentation software (https://www.neurobs.com/). Participants had to read sentences displayed one word at a time (200 ms + 500 ms inter-stimulus blank interval) in the center of the computer screen, on a grey background. Sentence words were displayed in red until 3 words before the critical article and in white from 2 words before the critical article until the final word of the sentence. Each sentence was preceded by a fixation cross displayed for 2000 ms. The common instruction for the three groups was to read each sentence silently and to answer 'yes' or 'no' to the following comprehension question by pressing a YES or NO button on a keyboard. Comprehension questions were inserted after each sentence to keep participants engaged in the silent reading task, and to get a complete assessment of sentence comprehension (to make sure that some dual-tasks were not more disturbing than others in terms of sentence comprehension).
Apart from reading sentences for comprehension, participants received other instructions varying depending on the group they were assigned to. Participants in the SP group were asked to produce the syllable/ta/each time a red word was displayed, and to stop doing so when the words started to turn into white (2 words before the critical article). This way, we made sure that double-tasking was performed during reading the first words of the sentence (i.e., sentence context used to build up predictions) and that it stopped on the word preceding the target article. This was crucial to avoid contamination by muscular activity of the ERPs time-locked on the target article. Participants in the TT group were asked to perform tongue-tapping each time a red word was displayed, and to stop doing so on words displayed in white. Note that since sentences were presented one word at a time on the computer screen, regularity for the secondary task was provided by the regularity of word display with no need of including beats (as usually done in AS experiments; see 16 ). Note also that the SP and TT groups performed a secondary task with similar cognitive burden, both including motor action and feedback perception. The only difference between the two tasks was in the "linguistic status" of the articulation and feedback, being a syllable in the SP group and a noise in the TT group. Finally, participants in the SL group were informed that, during reading, they were going to listen to their own voice pronouncing/ta/on each word displayed in red, and not anymore once the words turn into white. In order to do so, after signing consent form and before preparing the electrode cap, participants assigned to the SL group were asked to pronounce the syllable/ta/several times in front of a microphone. Ten different utterances of the syllable were then extracted and inserted to the program, so that each participant would listen to her own voice. Along the experiment, each word displayed in red was presented together with one of the 10 utterances, randomly assigned. Each utterance was displayed 360 ± 75 ms after the word onset (range 285-435 ms), randomly, in order to mimic latencies of feedback perception during/ta/production (SP group).
All participants were explicitly encouraged to focus on sentence comprehension and to try to avoid distraction from the second task. Participants were informed that in case they would have stopped performing the secondary task (in the case of SP and TT groups), the experimenter would have reminded them to continue. Note that no Antes de entrar al piso tuvo que quedar con el propietario para firmar el contrato/la escritura ante notario.
Before entering the flat, I had to plan to meet the owner to sign the contract/the deed with a public notary.

When you get disoriented, look at the compass which always shows the north/the direction and the path.
Nunca sé dónde llevar mi móvil y mi cartera, tengo que comprarme un bolso/una mochila que combine con todo.

I never know how to carry my mobile and my purse, I need to buy a bag/a backpack to carry it all.
Para pedirle matrimonio se arrodilló ante ella y le dio un anillo/una joya brillante.
To ask her to marry him, he knelt in front of her and gave her a sparkling ring/gem.
To be cautious, you should put on the integral helmet/the leather jacket each time you drive your motorbike.
When he was young, he used to play in a famous group/band. Desde la terraza del apartamento de la playa se podía ver el mar/la catedral y los surfistas/y el mar.
From the terrace of the beach apartment once could see the see/the cathedral and the surfers/and the see.
Para cortar la carne se necesita un cuchillo/una tabla de metal/y un cuchillo.
To cut the meat once need a knife/a board of metal/and a knife.
El rey llevaba en la cabeza una corona/un sombrero antigua.
The king wore an ancient crown/hat on his head.
El símbolo del catolicismo es la cruz/el pez en muchas iglesias.

The symbol of Catholicism is the cross/the fish in many churches.
La ropa está sucia, ponla en la lavadora/el suelo por favor.
The clothes are dirty, put them in the washer/on the floor please.

Kids build sand castles on the beach/in the playground during summer/recess.
Acabo de salir de casa y no recuerdo si he cerrado la puerta/el armario cuando me he ido.

I just left home and I cannot remember if I closed the door/the cupboard when I left.
Cada invierno se hace una campaña para vacunar a la gente mayor contra la gripe/el virus común/de la gripe.

Every winter a vaccination campaign against the common flu/the flu virus is organized for older people.
Se despertó sudando y temblando, había tenido una pesadilla/un sueño terrible.
He woke up sweating and shivering, he had a terrible nightmare/dream.
Everything went dark because of a sudden lack of (the) light/sun. Table 1. Examples of sentences. Critical expected/unexpected noun-phrases are depicted in red. English translations are provided, below each sentence, in italic. participant had to be reminded of the secondary task, probably because the red display of the first words of each sentence was a clear signal reminding the participants they had to start again the secondary task.
Stimuli were presented in four blocks of 25 sentences, with a small break between the blocks. A brief practice session included three sentences, and the corresponding yes-no questions. Overall, the experiment lasted one hour and 30 minutes on average.
Electrophysiological recording and statistical analyses. Electrophysiological data were recorded from 27 TiN electrodes placed according to the 10-20 convention (Easycap; Fp1/2, F7/8, F3/4, FC5/6, FC1/2, T7/8, C3/4, CP1/2, CP5/6, P3/4, P7/8, O1/2, F/C/Pz). Additional electrodes were placed over the left (on-line reference) and right mastoids. A forehead electrode served as the ground. Four electrodes were placed around the eyes (VEOL, VEOR, HEOL, HEOR) in order to detect blinks and eye movements. Data were amplified (Brain Amp DC) with a bandwidth of 0.01-100 Hz, at a sampling rate of 250 Hz. Impedances were kept below 5 kOhm for the scalp electrodes and 10 kOhm for the eye electrodes. Recordings were off-line re-referenced to the average activity of the two mastoids and re-filtered with a 30 Hz low pass filter (48 dB/oct) and a 0.1 Hz high pass filter (12 dB/oct). Eye blink artifacts were corrected using the Gratton et al. 's procedure 26 , implemented in Brain Vision Analyzer 2.0 (Brain Products, München, Germany), and any remaining artifacts exceeding +/−100 μV were dismissed. On average 7.41% of epochs were considered artifacts. The number of dismissed epochs was slightly larger for unexpected relative to expected nouns (F [1,53]
Higher cognitive load has been associated to slower performance but also increased distraction and so increased variability in response 29,30 . Consequently, we can safely assume that if any of the secondary tasks was associated with higher cognitive load, performance would be more variable in the group undergoing this task (i.e., larger standard deviations in performance should be observed). Thus, we also explored variability in performance by computing the standard deviation in accuracy and reaction time for each participant, and testing whether those standard deviations were affected by the secondary task (i.e., differed across groups). Neither standard deviations in accuracy (SP group: 0.39 ± 0.05; TT group: 0.38 ± 0.07; SL group: 0.36 ± 0.06) nor standard deviations in reaction times (SP group: 1551 ± 779 ms; TT group: 1317 ± 401 ms; SL group: 1662 ± 886 ms) significantly differed across the three groups (F[2,53] = 1.11, p = 0.34, ƞ 2 = 0.040 and F[2,53] = 1.07, p = 0.35, ƞ 2 = 0.039 respectively). ERP data. The ERP pattern elicited by the critical noun-phrases is depicted in Fig. 1. The distribution of the late ERP component elicited by the article is consistent with the long-lasting effect that has been previously reported in similar experiments on lexical prediction, and consistently labeled N400 [e.g. 20,22,23 ]. Whether such component should be assimilated to the classical N400 component or not is open to debate. Nevertheless, since the interpretation of our results does not depend on the component per se (but on the modulation of ERPs by expectation), we will use the N400 label in order to follow the literature. Post-hoc analyses of the Group × Laterality × Expectation interaction revealed a significant expectation effect on the left, medial and right sites in the TT group (all ps < 0.001). The expectation effects was significant on the medial and right sites in the SP group (both ps < 0.001) but was not significant on the left sites (p = 0.086). The expectation effect was significant on the left sites in the SL group (p = 0.003) but was not significant on the medial and right sites (p > 0.99).
To summarize, the expectation effect on the critical article was significant in the two control groups (TT and SL groups) but did not reach significance in the SP group. The magnitude of this effect did not significantly differ between the TT and SL groups. The expectation effect on the critical noun was significant in the three groups (left lateralized in the SL group). The magnitude of this effect did not significantly vary across the three groups.

Discussion
The aim of the present study was to determine whether the link between the production system and language comprehension is prediction 5 . To do so, we capitalized on recent frameworks arguing that prediction during comprehension is based on actual production. In other words, we hypothesized that the production-comprehension link might be explained, at least partly, by the mandatory role of production in prediction. To test this hypothesis, we explored whether the availability of the production system was indeed necessary for prediction during sentence comprehension. We measured the magnitude of the lexical expectation effect during sentence reading (N400 effect elicited by expected relative to unexpected noun-phrases) in three groups of participants differing in a simultaneous secondary task: syllable production (aimed to tax the production system by preventing subvocal rehearsal of the verbal input; SP group), tongue-tapping (aimed to mimic syllable production without taxing the production system; TT group) and syllable listening (aimed to mimic feedback perception inherently associated to syllable production in the SP group; SL group). We hypothesized that the expectation effect should be larger in Black lines depict ERPs measured for expected noun-phrases; red lines depict ERPs measured for unexpected noun-phrases. ERPs measured over the Medial Anterior (FP1, FP2, Fz), Medial Central (C3, C4, Cz) and Medial Posterior (P3, P4, Pz) scalp. Grey areas indicate the time-windows used to measure the N400 wave elicited by the article (300-500 ms after the article onset) and the N400 wave elicited by the noun (300-500 ms after the noun onset). Negativity is plotted up. Bottom panel: Topographical maps of the N400 effect (expected minus unexpected conditions) elicited by the article and noun in the SP, TT and SL groups. Each map depicts the mean amplitude of the expected-unexpected difference in the 300-500 ms time-window following the critical word, from −2 to 2 μV for the article and from −1 to 2 μV for the noun (except for the noun in the TT group: from 0 to 3 μV). the TT group relative to the SP group, if prediction requires availability of the speech production system. Plus, we hypothesized a larger expectation effect in the SL group relative to the SP group if taxing the production systemand not own voice feedback perception -was responsible for the reduced prediction in the SP group.
The results revealed that the expectation effect was reduced in the SP group relative to both the TT and SL groups. Participants in the TT and SL groups only actively predicted upcoming words during sentence reading (significant expectation effect on the critical article [18][19][20]. These findings show that taxing the production system (here, preventing subvocal rehearsal of the verbal input during sentence context reading) hinders prediction during sentence comprehension. Crucially, performing articulatory movements and perceiving associated feedback (TT group) or listening to own speech during reading (SL group) are not the factors responsible for the reduced expectation effect in the SP group. With the present experimental series we provide the first direct evidence for a strong and relevant implication of the production system in lexical prediction during sentence reading. It remains to be explored whether the production system plays a crucial role in other types of prediction (e.g., semantic, phonological prediction) and whether its major impact on prediction generalizes to other experimental settings. In fact, the production system might play a critical (even mandatory) role in 'prediction by simulation' (prediction based on own-body experience) and not in 'prediction by association' (prediction based on previous perceptual experience; see 9,15 ). For now, our results support the view that the production system plays a critical role in lexical prediction during sentence comprehension 6,7,31,32 . Importantly, our results are not only relevant for models on language comprehension, but also for the main open and crucial question on the link between production and comprehension in language 5 . Going one step beyond previous studies showing that the production system is engaged during speech perception [1][2][3][4]33 , we show that such involvement of the production system during comprehension can be explained, at least partly, by its major role in prediction.
It could be argued that the reduction of prediction effects in the SP group was the consequence of larger cognitive burden in the syllable production secondary task, relative to the tongue tapping and syllable listening secondary tasks. Nevertheless, previous studies revealed that articulatory suppression and tapping do not differ in the level of disruption they entail in several non-linguistic cognitive tasks such as digit size judgment tasks 34 and task switching 35,36 , suggesting that those secondary tasks do not drastically vary in the amount of cognitive load they imply. Furthermore, the only difference between the syllable production and tapping tasks in the present study was in the linguistic status of the production (being a syllable or a noise), which also indicates that the cognitive load imposed by those secondary tasks was similar. Finally, performance both in terms of average reaction times and accuracy, and associated variability measures in the comprehension questions did not significantly differ across groups. Taken together, this pattern of results suggests that the level of cognitive load was similar across the three groups [27][28][29][30] . Still, we cannot entirely rule out the possibility that a larger cognitive load entailed by the syllable production task was affecting prediction and not comprehension. Future work will be needed to deeply explore cognitive load implied by articulatory suppression during reading and whether it can affect prediction specifically.
Our results cannot speak on the nature of the prediction and the role of production in it. Participants in the two control groups certainly built specific lexical predictions of the upcoming noun and its gender. Whether such prediction involves phonological and/or phonotactic representations (of the noun and/or article) remains to be explored (see 37 and 24,38 for evidence pro and against prediction involving phonological representations). Without concluding on the nature of the prediction (i.e., the "what"; see 1 ), we can assert that the way lexical predictions are built rests on subvocal rehearsal of the verbal input 17 during context reading, or at least on production processes made inoperative by rehearsal. Interestingly, we can state that the availability of the production system plays a crucial role during context reading (and not only at a late point in time close to the predictable input), given that the articulatory suppression in the SP group took place during context reading and stopped 3 words before the display of the critical word of the sentence. Thus, the role of the production system seems to be predominant when constraining semantic information is gathered from the sentence context. The availability of the production system late in time (a few words before the predictable input) is not sufficient to build up lexical predictions, revealing an important role of production in assembling semantic information from the context. The necessary role of the production system in selecting the most expected word itself cannot be defined given that production was not blocked anymore during such preparatory processes likely to happen a few words before the critical noun-phrase.
The present results are also relevant in regard to the current interest on variability in predictive processes 39,40 . Despite the fact that many researchers agree that readers actively predict upcoming information during sentence comprehension, it is also largely admitted that such predictive processes are prone to variability in participants, task requirement and context 13,15,39 . Many authors agree that predictive processes should be affected by cognitive resources availability and cognitive control 39,41 but evidence of it is scarce. Ito and colleagues 42 recently showed that predictive processing was delayed when participants had to perform a secondary working memory task during sentence listening (see also 43 and 44 for evidence pro and against an involvement of working memory capacities in prediction). Thus, prediction might not be robust enough to be unaffected by verbal working memory load. Quite the opposite, the present results tend to show some sort of impermeability of predictive processes to cognitive resource availability, given that the expectation effect was largely significant in both the TT and SL groups, despite the concurrent non-verbal secondary task. Note also that the magnitude of the N400 effect elicited by the article in the TT and SL groups is similar to the one reported in a previous experiment using similar paradigm and materials but without double-tasking 20 . Thus, we provide the first piece of evidence that prediction is a cognitive process strong/relevant enough to survive non-verbal double-tasking. This is not to say that comprehenders always predict, but that prediction might be automatic as far as the language production system and verbal working memory are available. Since it could be that our control tasks were not cognitively demanding enough to significantly affect any other process, further research is needed to conclude on the relative automaticity of prediction depending on the type and amount of cognitive load at play. Interestingly, it has been shown that prediction effects are reduced in second language (L2) readers 23,45 , in low literate adults 46 , in children with poor vocabulary 11 and in older adults 10 . Such reduced prediction effects can arguably be linked to reduced cognitive resource availability (e.g., reduced verbal working memory), but the present results offer an interesting new perspective: The lack of prediction in certain populations might reflect a lack of fast and efficient engagement of production processes during comprehension. This assumption should be tested for a full understanding of the production/comprehension functional link.
Finally, the results observed on the N400 component elicited by the noun are also informative. This N400 component reflects semantic processing of the critical noun, which can be influenced by sentence context through passive resonance, message-level build up, but also through after-effects of prediction (see 23,47 for extensive discussion). Here, we observed a significant expectation effect on the critical noun in the three groups. This shows that integration of the critical noun was not hindered by the lack of prediction in the SP group. As previously shown in L2 readers, the most expected critical noun was easier to integrate, based on previous sentence context, despite the absence of significant evidence of its active prediction 23,47 . This result is also in line with previous work showing that articulatory suppression does not prevent proper sentence comprehension 48 , and with neuropsychological evidence showing that aphasic patients with highly impaired production skills can have language comprehension somehow preserved (for a review see 49 ). Thus, production is necessary for prediction but prediction is not mandatory for proper integration. We can also assert that the unavailability of the production system during sentence context reading is not detrimental for proper integration of the predictable noun. Whether such unavailability would still have negligible effects on word integration if it was to happen during (and not only before) critical word display still has to be explored, given that, in the present study, the production system was not taxed anymore during predictable noun integration.
To conclude, the present study provides the first strong and direct evidence in favor of the hypothesis that prediction is production, showing that the availability of the speech production system during sentence context display is necessary to build up lexical prediction during reading. The major role of the production system for prediction in comprehension can explain the recruitment of production processes during language comprehension.