A 10-hour within-participant magnetoencephalography narrative dataset to test models of language comprehension

Recently, cognitive neuroscientists have increasingly studied the brain responses to narratives. At the same time, we are witnessing exciting developments in natural language processing where large-scale neural network models can be used to instantiate cognitive hypotheses in narrative processing. Yet, they learn from text alone and we lack ways of incorporating biological constraints during training. To mitigate this gap, we provide a narrative comprehension magnetoencephalography (MEG) data resource that can be used to train neural network models directly on brain data. We recorded from 3 participants, 10 separate recording hour-long sessions each, while they listened to audiobooks in English. After story listening, participants answered short questions about their experience. To minimize head movement, the participants wore MEG-compatible head casts, which immobilized their head position during recording. We report a basic evoked-response analysis showing that the responses accurately localize to primary auditory areas. The responses are robust and conserved across 10 sessions for every participant. We also provide usage notes and briefly outline possible future uses of the resource.


Background & Summary
Conveying and understanding the world through narratives is a pervasive and fundamental aspect of the human culture 1 .The storytelling and sense-seeking propensity of the brain allows us to relate the temporally varying and fleeting nature of the world into a stable and meaningful chain of events, entities and their relations 2,3 .In recent years, cognitive neuroscientists of language have seen an increased interest in studying the brain responses when humans read or listen to narratives [4][5][6][7][8][9][10] .Narratives afford the study of brain responses to language input which is highly contextualized and contains the rich variety of temporal dynamics and representations (e.g.phonemic, lexical, phrasal, morphemic, sentential) 11 .Given this inherent richness of neuronal dynamics, model-based investigations -where cognitive hypotheses on multiple levels are operationalized and tested simultaneously 12 -are particularly suitable in narrative neuroscience.
It is not a coincidence then that narrative cognitive neuroscience has been encouraged by the simultaneous development and availability of novel modeling techniques in the fields of natural language processing (NLP) and computational linguistics 6,13 .Echoing the recent revival of research in artificial neural networks (ANNs) and deep learning 14,15 , NLP has seen considerable progress in research applications using ANN architectures 16 .Earlier breakthroughs were achieved by the development of the so-called neural language models, that is, ANNs optimized to solve the next-word prediction task 17,18 .This development was further underscored by the finding that the numeric vector word representations, derived by using such architectures, develop human-interpretable structure.For example, it is well-established that the numeric word vectors can reflect lexical similarity of the encoded words 19,20 .Most recently, NLP has witnessed a trend towards pretraining generic large-scale neural language models on unprecedented amounts of raw text and successfully using the pretrained models for improved performance on downstream language tasks [21][22][23] .
The availability of language models that can process connected text has increased the scope of cognitive neuroscientists' toolkit for probing the relationship between computational language representations and the neural signals.Mirroring the successes in computer vision 24 and the subsequent modeling of neural processing in visual perceptual hierarchies [25][26][27] , computational linguists are beginning to interpret how language models achieve their task performance [28][29][30] and what is the correspondence between such pretrained model representations and neural responses recorded when participants engage in similar language tasks [31][32][33][34][35][36][37] .On the one hand, task-optimized ANNs therefore serve as a tool and a framework that allow us to operationalize and identify which computational primitives serve as the candidate hypotheses for explaining neural data [38][39][40][41] .On the other hand, data from neuroscience can provide important biological constraints for the ANN-based processing architectures [42][43][44] .
Yet, in order to incorporate neurophysiological constraints in the training of brain-based ANNs, we need sufficiently large and high-quality language-brain datasets and resources that allow direct training and testing of such complex models 43 .Neural networks are known to be 'data-hungry' .That is, because the number of optimized parameters is typically very large, the models need to be constrained by sufficient amounts of data points in order to reduce the error variance during training 45 .In studies where the cognitive hypothesis of interest is embodied in the experimental design, the nature of well-controlled hand-crafted stimuli typically puts a limit to the number of available train-test samples within each participant.For example, the dataset from the landmark study by 31 contains fMRI data for a total of 60 nouns.Since then, the availability of model-based analyses approaches 13 has led to increased curation and sharing of dedicated language neuroimaging datasets that leverage larger amount of data points across large numbers of participants 46,47 It is known that the increasing amount of training repetitions within each participant improves predictive performance of models.This was shown, for example, in predicting visually-evoked MEG responses 48 and in speech/text synthesis on the basis of intracranial electrophysiology recordings 49,50 .Given the recent interest in the interpretability of ANN language models 28 and the development of ANN models of brain function 51 , care must be taken to ensure that the to-be investigated models trained on neural data of individual participants are reliably estimated to begin with 52 .
Here we describe a narrative comprehension magnetoencephalography (MEG) data resource recorded while three participants listened nearly 10 hours of audiobooks each (see Fig. 1).MEG is a non-invasive technique for measuring magnetic fields induced by synaptic and transmembrane electric currents in large populations (∼10,000-50,000 cells) of nearby neurons in the neocortex 53,54 .Due to the rapid nature of electrophysiological responses and the millisecond sampling rate of the MEG hardware 55,56 , it is frequently the method of choice for studying the neural basis of cognitive processes related to language comprehension.The target applications of this dataset are studies aiming to build and test novel encoding or decoding models 57 of spoken narrative comprehension (for English), to evaluate theories of narrative comprehension at various timescales (e.g.words, sentences, story), and to test current natural language processing models against brain data.

Methods
This report follows the established conventions for reporting MEG data 58,59 .
participants.A total of 3 (1 female) aged 35, 30, and 28 years were included in the study.All three participants were native English speakers (UK, US and South African English).All participants were right-handed and had normal or corrected-to-normal vision.The participants reported no history of neurological, developmental or language deficits.In the written informed consent procedure, they explicitly consented for the anonymized collected data to be used for research purposes by other researchers.The study was approved by the local ethics committee (CMO -the local "Committee on Research Involving Human Subjects" in the Arnhem-Nijmegen region) and followed guidelines of the Helsinki declaration.
Our participants were speakers of three distinct English dialects (UK, US and South African English).While this could be a potential source of inter-participant variability in neural responses, the nature of out-group accent effects in M/EEG remains debated 60 .Importantly, we speculate that accent-driven variability is less of an issue for the current dataset where two participants listened to an out-group dialect (British English) where the lack of exposure, we reason, was likely not critical.
Finally, it is worth pointing out, given the emphasis on the number of data points within each participant, that this design decision introduces a trade-off in terms of what kind of conclusions and inference can be achieved.Specifically, given the small number of participants, the current dataset is not well suited for group-level inference.In other words, if the goal of a potential analysis is to generalize a phenomenon across participants, the current data resource should best be complemented by another resource that permits group-level inference.However, there is opportunity even for studies that primarily aim to achieve group-level generalization -namely, the large number of recordings within the same participant minimize the sources of variability which makes this resource a valuable starting point for exploratory analyses.Such data-driven, exploratory analyses can lead to concrete hypotheses which can then be tested in a confirmatory study, for example, on a dataset with a larger number of participants.

Stimulus materials. We used The Adventures of Sherlock Holmes by Arthur Conan Doyle as read by David
Clarke and distributed through the LibriVox public library (https://librivox.org).The primary considerations in the selection of these stimulus materials were: a) the expectation of a relatively restricted or controlled vocabulary limited to real-life story contents (as opposed to in, for example, highly innovative styles of writing or fantasy literature where the dimensionality of semantic spaces can be expected to be higher), which made it reasonable to expect that models would be able to meaningfully capture the text statistics, b) sufficient number of books which are available as plain text, and c) the availability of corresponding audiobooks (the plain text of The Adventures of Sherlock Holmes was obtained from https://sherlock-holm.es/stories/plain-text/advs.txt,accessed on September  11, 2018)  Each individual story was further divided into subsections.The subsections were determined by us after reading through the stories.We made sure that the breaks occurred in meaningful text locations, for example, that prominent narrative events are not split across two runs.Stimuli and materials are available in the '/stimuli' folder in the top-level directory.The full specification of stimuus materials is available in Table 1 of the supplementary materials.

Word timing information.
To determine word onsets and offsets in every auditory story we performed automatic forced alignment of the audio recordings and the text obtained from Project Gutenberg.We used the Penn Forced Aligner toolkit 61 with pretrained acoustic models for English.Tokenization of the story text was performed with the tokenizer module in the Spacy natural language processing library (https://spacy.io/api/tokenizer).Some of the tokenization rules will result in tokens that do not lend themselves well as a unit of analysis for the corresponding acoustic forms.Notably, the tokenizer by default will split contractions: ("don't" -> "do" "not").To be able to use contracted forms as inputs to the forced alignment algorithm, we post-edited the split contractions back to the original form.Word tokens not included in the precompiled model dictionary were added manually upon inspection of the forced alignment log files (e.g.proper names, see the "dict.local"file for the full list).The arpabet pronunciation codes for missed tokens were generated using the web interface of the CMU LOGIOS Lexicon Tool (http://www.speech.cs.cmu.edu/tools/lextool.html).Symbols for stress in pronunciations were added manually to the generated arpabet pronunciation codes.
Task and experimental design.Each of the 3 participants listened to the 10 stories from the Adventures of Sherlock Holmes.A separate MEG session consisted of listening to a single story from the collection.Each recording session took place on a separate day.A single run (i.e. an uninterrupted period of continuous data acquisition) consisted of participants listening to a subsection of a story (see Fig. 2).The order of story and run presentation were kept the same for all participants (see Table 1, supplementary information).Participants were instructed to listen attentively for comprehension.After each run, the participants answered comprehension questionnaires and reported their literary experience and were able to take short breaks.The experimenter was available for clarification prior to the beginning of the recording.
Each story was presented binaurally via a sound pressure transducer through two plastic tubes terminating in plastic insert earpieces.A black screen with a fixation cross was maintained while participants listened to the stories.Presentation of the auditory stories was controlled with Presentation software (version v 16.4., build 06.07.13,NeuroBehavioral Systems Inc.).comprehension check.Comprehension check was used after each run to make sure participants were following the story contents.Each comprehension check consisted of 1 multiple choice question per run with 3 possible answers.The questions were designed by us and should have been possible to answer correctly for people who had read the stories with normal attention (example question: 'What is being investigated in the story?').The participants indicated their response by means of a button box and had no time limit to do so.For the full questionnaire with answers see 'questions_tabular.txt' .1. P-values for comparisons in the canonical correlation analysis (CCA) of repeated segments (cca) and randomly selected segments (cca ′ ).The 'within' and 'across' subscripts denote CCA performed on paired runs selected within the same recording session and runs selected across different recording sessions, respectively.Fig. 1 The structure of the dataset.We recorded the MEG from 3 participants, 10 separate recording sessions each.In each session, we recorded MEG data (275-channel axial gradiometer CTF system) while participants listened to audiobooks (The Adventures of Sherlock Holmes) in English.Along with the MEG data, we also tracked eye-movements and pupil dilations.After story listening in each session, participants answered short behavioral questionnaires about their narrative comprehension experience.We also provide the timings of word onsets for every story in the dataset that can be used to relate pretrained models (or other linguistic features) to the MEG data.The icons in the figures from https://www.flaticon.com/authors/freepik.
information density and absorption questions.After each run, the participants reported their perceived informativeness of the heard story subsection (by answering the question: 'How informative do you think this section was for the story development?')and indicated their level of absorption (i.e. level of agreement with the statement: 'I found this section very captivating').They indicated their perceived information density by rating their response on a visual scale from 1 ('Not at all informative') to 7 ('Very informative').They indicated their level of absorption by rating their response on a visual scale from 1 ('Disagree') to 7 (' Agree').

Literary appreciation.
At the end of each recording session, the participants were asked to report appreciation of the heard story.The appreciation questionnaire was the one used by 62 who adapted the version by 63 .
The questionnaire consisted of a general score of story liking (I thought this was a good story.),thirteen statements with adjectives that indicated their impression of the story (e.g.I thought this story was... {Beautiful... Entertaining, Ominous}).Finally, they rated their agreement to 6 statements regarding their enjoyment of the story (adapted from 64 ; e.g.I was constantly curious about how the story would end).Participants rated their story liking (statements with adjectives), and the statements regarding their enjoyment on a 7-point scale ranging from 1 ('Disagree') to 7 (' Agree').For the full questionnaire see Table 6.

A) kidnapping B) murder C) theft
How informative was this secion for story development?
not at all informative v ery informative I found this section very captivating Run structure Fig. 2 Trial structure.In each session, the participants listened to one story from the collection 'The Adventures of Sherlock Holmes' .Each story was further split into subsections which correspond to experimental runs.The structure of one run is depicted here.After listening to the subsection, the participants answered a simple multiple choice comprehension question and rated the information density and absorption levels of the heard subsection.After the responses, each participants listened to a short story snippet repeated twice.For the repeated section, the same stimulus (taken from Sherlock Holmes story not used in the main set) was used after each run.The speaker icon created by Pixel Perfect (https://www.flaticon.com/authors/pixel-perfect).

MEG data acquisition.
We recorded MEG data (275-channel axial gradiometer CTF system) while participants listened to audiobooks in English in a magnetically shielded room.The MEG signals were digitized at a sampling rate of 1200 Hz (the cutoff frequency of the analog anti-aliasing low pass filter was 300 Hz).Throughout the recording, the participants wore an MEG-compatible headcast 65 (see Fig. 3).Per convention, three head localizer coils were attached to the inner side of the headcast at the nasion, left pre-auricular, and right pre-auricular sites.Head position was monitored during the recording using a custom build monitoring tool 66 .
Empty room recordings.Immediately before or after each recording session (depending on lab availability), we performed empty room recordings.These recordings lasted for approximately 5 minutes.Empty room recordings are located in a separate .dsfolder in the respective session folder (see Fig. 4, panel b).

Repeated stimulus recordings.
In between runs, we recorded MEG responses to a short (half minute) excerpt from The Adventures of Sherlock Holmes which was not used during the main task ('noise_ceiling.wav').The stimulus was repeated twice between runs.The MEG and Presentation trigger codes marking the onset and offset of the repeated stimuli have the values 100 and 150, respectively.

MRi-MEG coregistration.
To co-register the structural MRI images to the MEG coordinate space, we first transformed the individual participant's MRI images from the voxel coordinate system to the ACPC head coordinate system by placing the origin of the head coordinate system to the anterior commissure as interactively identified on the structural brain images (see http://www.fieldtriptoolbox.org/faq/anterior_commissure/).We then used the mesh describing the participants' head shape (see Fig. 3, panel b) and extracted from it the meshes corresponding to the nasion, left pre-auricular, and right pre-auricular fiducial coils.Once the meshes describing the coil geometry were extracted from the head shape mesh, we localized the center points of the each of the three coils.These center points of the coils were taken to represent the locations where the fiducial coils were placed during the recordings (as the coils were actually placed in the empty slots at the positions in the geometric model).These extracted coordinate points were then manually inspected and appropriately defined as the nasion, the left pre-auricular and right pre-auricular points based on the signs and values of their x, y, and z coordinates.The above procedure allowed us to coregister the MRI image to the MEG coordinate space.The outcome of the coregistration for each participant is shown in Fig. 3.

Eye-tracking data acquisition.
Concurrently with the MEG, we recorded participants' eye-movements.We used the Eyelink 1000 Eyetracker (SR Research ©) at a sampling rate of 1000 Hz.The 9-point scheme was used for calibration after positioning the participant within the MEG dewar and prior to starting the MEG data acquisition.The participant's left eye was tracked in all cases.

Data Records
The dataset can be accessed at the data repository of the Donders Institute for Brain, Cognition and Behaviour 67 .The dataset is shared under the Data use agreement for identifiable human data (version RU-DI-HD−1.0, https://data.donders.ru.nl/doc/dua/RU-DI-HD-1.0.html?3), developed by the Donders Institute and Radboud University, which specifies the conditions and restrictions under which the data is shared.The dataset organization follows a BIDS-like specification for storing and sharing the MEG data 68 .The organization of the dataset directory is presented in Fig. 4. The three folders at the highest level (Fig. 4a) contain the data for three participants (sub-001, sub-002, and sub-003).Each participant directory contains data folders for respective sessions (ses-001, ses-002 etc.) with subfolder for each respective data modality (eyelink for eye-tracking data, meg for MEG data etc.).The organization of session-specific directories is displayed in panel B of Fig. 4. The contents of the individual folders in the dataset directory are briefly described below.

code ('/code').
The code folder contains source MATLAB code for the analyses presented in this report and additional wrapper scripts that -together with shared preprocessing data -represent minimal working examples of the present analysis pipeline.The code base relies heavily on the routines of the FieldTrip toolbox 69 .The high level '/code' folder contains two subfolders.The '/pipeline' subfolder contains the MATLAB scripts and functions that were used in preprocessing and the main analyses.The code is further grouped into 'audio' , 'meg' , 'models' , and 'utilities' folders each containing scripts and functions for the respective parts of preprocessing steps.The '/plots' folder contains the two scripts that were use to plot the figures for technical validation below.Further information about the code usage is provided in the 'README.txt'files in 'code/pipeline/meg' and 'code/plots' folders.

Stimulus folder ('/stimuli').
The '/stimuli' folder contains the .wavaudio files used in the recordings and the corresponding text files (see Stimulus Materials).The naming of the files follows the '%02d_%d' format where the first digit (with a front-padded zero digit) marks the session and the second digit (not zero padded) codes the run number in that session (see Table 1 in Supplementary materials).In addition, the '/stimuli' folder also contains the input ('tokens.txt')and the output ('pfa.txt') of the forced alignment (see Section 'Word Timing Information') providing word onset timings for all the words in the input tokens lists.The timing information

Derivatives folder ('/derivatives').
The '/derivatives' folder contains the outputs of specific preprocessing steps related to anatomical and MEG data used in the technical validation presently and that can be potentially of use in further analysis pipelines.workbench-command/-gifti-help).The cortical parcellations are provided at 3 different resolutions (32k, 8k,and 4k source points per hemisphere).The analyses in this report are based on the 8k resolution parcellation scheme.Finally, we provide two inflated cortical surface descriptions that were used for visualization purposes presently ('cortex_inflated.mat' and 'cortex_inflated_shifted.mat').
Raw MEG data folder ('/sub-00X/ses-00Y/meg').The '/meg' folder contains two raw MEG '.ds' datasets; the task-based narrative comprehension (with the infix 'task-compr') and the session-specific empty room recording ('task-empty').In addition, the folder contains several BIDS sidecar files with meta information about the datasets: a '.json' file with MEG acquisition parameters ('*_meg.json'),tab-separated table with MEG channel information description ('*_channels.tsv'),and a table containing detailed timing information about relevant events that occurred during the measurements ('*_events.tsv'),specifically word and phoneme onset times as obtained from the forced alignment procedure (see Table 4).

Technical Validation
All participants were monitored during data acquisition to ensure task compliance and general data quality.As a technical validation, we perform the analysis of the amount of head movement for each of the three participants and compare it to the dataset recorded without headcasts.We also perform a basic auditory evoked-response analysis and source localization for every participant and every session.

Head movement.
In a previous study, Meyer et al. 65 have shown that it was possible to reposition the absolute head position to 0.6 SD across 10 repositionings of the participant wearing a headcast.Here, we report the displacement in the x (left-right), y (left-right), and z (up-down) directions of the circumference of the nasion and the left and right coils.We extracted the head coil localization ('HLC*') channels and epoched the session-specific dataset into trials of 1 minute length.We first computed the mean absolute displacement in x, y, and z directions across each 1-minute trial.We then computed the circumcenter of the nasion and the left and right coils in the x, y, and z dimensions per trial.This resulted in a measure of head position and orientation in x, y and z coordinates per trial.Finally, we centered the obtained head position values across the trial dimension by subtracting the mean head position in the specific direction.
In Fig. 5, we report the average head position displacement across the first minute of recording.The figure shows that head positioning at the onset of a new session was achieved within 1 millimeter accuracy and for the most part within 0.5 millimeters which is in line with previous reports 65 .
In Fig. 6, we show head movement data in a previously published dataset without headcast 70 and the current dataset.The dataset without headcast shows clear movement displacement in all directions within the reported limits of 5 mm.In the current dataset, the head movement was maintained below 1 millimeter throughout the recordings.This holds across all three participants (Fig. 7).

Evoked responses.
To provide an impression of the MEG data in the dataset, we perform a basic analysis of auditory evoked responses 71 per participant and every session in the dataset.
Source reconstruction.Cortical sheet reconstruction.To localize fiducial coils (the nasion, left and right ear) on participants' MRI images in the MNI coordinate space, we used the position information of where the digitized fiducials were placed on the headcasts in the CTF space (see Section 'MRI-MEG coregistration').After co-registration, we used the Brain Extraction Tool 72 from the FSL command-line library (v5.0.9.) 73 to delete the non-brain tissue (skull striping) from the whole head.To obtain a description of individual participant's cortical sheet, we performed cortical surface reconstruction with the Freesurfer image analysis suite, which is documented and freely available for download online (http://surfer.nmr.mgh.harvard.edu/),using the surface-based stream implemented in the 'recon_all' command-line tool.The post-processing of the reconstructed cortical surfaces was performed using the Connectome Workbench 'wb_command' command-line tools (v1.1.1;https:// www.humanconnectome.org/software/workbench-command).
Beamformer.The cortical sheet reconstruction procedure described above resulted in a description of individual participant's locations of potential neural sources along the cortical sheet (source model) with 7,842 source locations per hemisphere.We used a single-shell spherical volume conduction model based on a realistic shaped surface of the inside of the skull 74 to compute the forward projection matrices (leadfields).We used a common leadfield for estimating session-specific beamformer weights.To estimate MEG source time series, we used linearly constrained minimum variance (LCMV) spatial filtering 75 deployed with 'ft_sourceanalysis' routine.Source reconstruction was performed separately per each recording session.Data of all runs in a session were used to compute the data covariance matrix for beamformer computation.Source parcels (grouping of source points into brain areas or parcels) were created using a refined version of the Conte69 atlas (brainvis.wustl.edu/wiki/index.php//Caret:Atlases/Conte69?_Atlas), which provides a parcellation of the neocortical surface based on Brodmann's cytoarchitectonic atlas.The original atlas, consisting of 41 labeled parcels per hemisphere, was further refined to obtain 187 equisized parcels, observing the parcel boundaries of the original atlas.
Event-related fields.The outcome of the preprocessing and source-reconstruction steps are MEG time series for 370 brain parcels in the left and right hemispheres.To obtain the average event-related field (ERF) for each source parcel, we first concatenated all runs within each session.We then epoched the MEG time-series in time windows starting from 100 ms prior to word-onset and extending to 500 msec post word-onset.We then compute the average across the epochs.This results, for each participant and each recording session, in an average representation of the word-onset evoked signal for every source parcel.The results for session 1 of each participant are displayed in Fig. 8.
For all three participants, the ERFs show the expected temporal profile with early peaks at approximately 50 msec post word-onsets (Fig. 8, right).The inspection of source topographies at selected latencies (orange dashed lines in the right-hand side panels) shows clear focal topographies with peak activations localized in the primary auditory cortex in the superior temporal gyrus.The patterns are right-lateralized for participants 1 and 2 whereas they show a more bilateral pattern in participant 3.Such inter-individual variability in brain activation patterns in auditory language comprehension has been reported in MEG previously 76 .The ERFs show a broader range of frequencies over time than is perhaps typically shown in an ERP/ERF analysis.It should be noted that we use a rather broad bandpass filter (0.5-40 Hz) compared to other reports (e.g.Broderick et al. 77 filter at 1-8 Hz).Finally and importantly, the results are robust and consistent across all ten sessions in each participant.The results for other sessions for all participants are shown in Figs.9-11.

Repeat reliability.
After each run, we recorded two short 30 seconds long snippets repeated one after another (see Fig. 2).This repeated design across runs, sessions and participants allows for estimation of signal reliability that can inform main analyses of story runs.To do a basic quality analysis, we performed a canonical correlation analysis (CCA) where we train a linear model f k on a pair of repeated runs, Y Y Y [ , ] train r r 1 2

=
. The runs in the pair r are taken either from the same session (withing session condition) or from two different sessions (across session condition).The model f r trained on a pair of repeats r learns a set of linear weights W r that maximize the correlation between the two repeats in the pair.We then evaluate the model f r on a pair of held out repeats ≠ p p r ; not used in the training step.Specifically, model performance is quantified by computing the correlation between the data segments in the pair p after the model is applied: 1 .The rationale behind this evaluation procedure is that the presence of any shared brain features uncovered by the CCA model f k would lead to non-zero correlation when applied to unseen data.We compare the model trained on repeats f k against a baseline CCA model trained on pairs of 30-second snippets selected at random from the story runs (i.e. each 30-second snippet was recorded to different inputs).Put differently, the brain signal in the repeated segments condition (more precisely, the canonical variates) is predominantly expected to be driven by the shared components due to input repetition, whereas it is expect to contain less of the shared components across randomly selected paired segments.
We show (see Table 1) that there was a main effect of 'segment type' , that is, canonical variates obtained from repeated segments were significantly different from canonical variates obtained from randomly sampled segments (sampled either within or across sessions).However, there was no effect of 'session' , that is, comparing canonical variates based on segments sampled from within or across sessions were not significantly different (regardless of whether these were randomly selected story segments or repeated segments).This confirms that neural responses to repeated segments have a larger degree of shared signal component due to input repetition and, importantly, that this is consistent across sessions in all three participants.

Usage Notes
interpretation of behavioral log files ('_beh.txt').Each row in the tab-separated value file corresponds to a run in the session and contains responses to the behavioral questions that were answered after each run.Description of each variable is given in Table 2.
interpretation of the 'events.tsv'file.The 'events.tsv' file logs the events read from the MEG header files.
In addition, the 'events.tsv'file also contains the aligned events from the '.log' log files which contain a record of events generated by the Presentation experimental scripts.The event onsets are adjusted for the estimated delays between the Presentation trigger sending and their recording in the CTF acquisition system.Not all triggers from the Presentation script were sent to the CTF trigger channel.For completeness, we provide the mapping between the experimentally-relevant codes in the Presentation log files and the CTF trigger channel in Table 3.An illustrative (edited) example of the 'events.tsv'file is shown in Table 4.

Additional recording channels.
In Table 5 we provide the information about the additional channels recorded along with the MEG data.

Known Exceptions and issues
Bad channels.The following channels in this dataset show unstable behavior due to technical issues in the lab at the time of recording for sub-003, ses-004: MRC23, MLO22, MRP5, MLC23.Researchers are advised to remove the above channels during preprocessing.

Repeated run.
Due to technical issues, the experimental script crashed during run 3 for sub-003, ses-008.We restarted the experiment at run 3.This means parts of run 3 before the experiment stopped were listened to twice and only the second iteration of run 3 is complete.
Spike-like artifact in two channels (sub-003, ses-003).In sub-003, ses-003, channel MRP57 shows an uncharacteristic regular (approximately every 10 seconds or longer), but short-lived impulse-like events.We could not determine the origin of this artifact.In our experience, the artifact can be detected using established blind-source estimation techniques (e.g.independent component analysis) and removed from the data.Given its sparse temporal nature, we estimate it being unlikely that this artifact can significantly affect the quality of the dataset, but warrants consideration nevertheless.
Exceptions in appreciation measurements.For sub-001, ses-001, part of appreciation questionnaires were corrected offline.This corrected behavioral response documented in the behavioral log file with the suffix '_offline' .The reasons for these exceptions were due to inadvertent wrong button press and were reported to the responsible researcher immediately after the session ended.The file containing the entry with the correct response was created in order to keep the old and new response logged explicitly.
The appreciation measurements for sub-003, ses-008 are recorded in two files (ses-008A and ses-00B) due to the experimental script crashing (see Section 'Repeated Run').
session recording took place time date and time when the session recording took place story nr.session number, an integer representing the story index that was listened to in that recording session session nr.run number, an integer representing the run index in that session response recorded response (button box) to the multiple choice question (possible values: A, B or C) correct the correct response to the multiple choice question (possible values: A, B or C) ishit whether or not the given response matches the true response (can be 1 or 0) absorption_score the response to the absorption question (possible values: 1 through 7) density_score the response to the information density question (possible values: 1 through 7)

Fig. 3
Fig. 3 (a) Example of headcast placement on a participants' head.(b) Example of the geometrical model of the head and fiducial coil positioning.This headshape model was used for head cast production.(c) The outcome of the coregistration procedure, shown are head and source models relative to MEG sensors.

Fig. 5
Fig. 5 Session-initial head displacement per participant in all 10 sessions.

Fig. 6
Fig. 6 Comparison of head movement for five sessions in a dataset without headcast (left) and the current dataset (right).

Fig. 7
Fig. 7 Head movement per session and per participant.

Fig. 8
Fig. 8 Analysis of evoked responses for one session in the dataset.Right.Line plots show the averaged source time courses (ERFs) for all brain parcels (each line represents a brain parcel).Time point 0 on the time axis indicates the word onset.Left.The source topographies show the distribution of activations designated by the orange dashed line in the ERF time courses on the right.We selected the time points that approximately correspond to the peak activation of the earliest component post word-onset.

Fig. 9
Fig. 9 Analysis of evoked responses for participant 1, sessions 2 through 10.Right.Line plots show the averaged source time courses (ERFs) for all brain parcels (each line represents a brain parcel).Time point 0 on the time axis indicates the word onset.Left.The source topographies show the distribution of average activation within the interval designated by the orange dashed line in the ERF time courses on the right.We selected the time points that approximately correspond to the peak activation of the earliest component post word-onset.

Fig. 10 Fig. 11
Fig. 10 Analysis of evoked responses for participant 2, sessions 2 through 10.Right.Line plots show the averaged source time courses (ERFs) for all brain parcels (each line represents a brain parcel).Time point 0 on the time axis indicates the word onset.Left.The source topographies show the distribution of average activation within the interval designated by the orange dashed line in the ERF time courses on the right.We selected the time points that approximately correspond to the peak activation of the earliest component post word-onset.

Table 3 .
Mapping between events in the experimentally relevant code in the Presentation log files and the CTF trigger channel.The trigger values are two-digit codes where the first digit codes the run index.The second digit codes the identity of the event (i.e.story onset, offset etc.).

Table 4 .
A sample of an '*events.tsv'file.Each row logs one event relative to the MEG recording session onset.Time values are in seconds.For this example, we only show the first few events from session 1 and session 2 (hence, the discontinuity in row indices).

Table 5 .
Additional recording channels in the MEG datasets.