SpEx: a German-language dataset of speech and executive function performance

This work presents data from 148 German native speakers (20–55 years of age), who completed several speaking tasks, ranging from formal tests such as word production tests to more ecologically valid spontaneous tasks that were designed to mimic natural speech. This speech data is supplemented by performance measures on several standardised, computer-based executive functioning (EF) tests covering domains of working-memory, cognitive flexibility, inhibition, and attention. The speech and EF data are further complemented by a rich collection of demographic data that documents education level, family status, and physical and psychological well-being. Additionally, the dataset includes information of the participants’ hormone levels (cortisol, progesterone, oestradiol, and testosterone) at the time of testing. This dataset is thus a carefully curated, expansive collection of data that spans over different EF domains and includes both formal speaking tests as well as spontaneous speaking tasks, supplemented by valuable phenotypical information. This will thus provide the unique opportunity to perform a variety of analyses in the context of speech, EF, and inter-individual differences, and to our knowledge is the first of its kind in the German language. We refer to this dataset as SpEx since it combines speech and executive functioning data. Researchers interested in conducting exploratory or hypothesis-driven analyses in the field of individual differences in language and executive functioning, are encouraged to request access to this resource. Applicants will then be provided with an encrypted version of the data which can be downloaded.


Participants
This dataset includes 148 healthy participants with an age range of 20-55 [mean age 37.2 ± 11.1; 53 males (mean age = 35.6 ± 10.7); 95 females (mean age = 38.61± 11.4)].Eligible participants included native German speakers who had not acquired an additional language before starting school and had no neurological or psychiatric diagnoses.Early bilingual participants were excluded from the study.Information on the ethnic make-up of the sample was not collected.Participants had different levels of education (finished secondary school = 4; professional school/ job training = 45; finished high school with a university-entrance diploma = 43; university degree = 56).Recruitment took place in North Rhine-Westphalia (Germany) via social networks and the Forschungszentrum Jülich mailing list.Testing sessions took place at the Forschungszentrum Jülich and took a duration of 150-180 min depending on the time needed for instructions and the speed with which the participants performed the tests.A remuneration fee of €50 was paid.

Procedures
Data collection was performed by four examiners, all of whom were required to conduct several pilot tests and were instructed by the study leader to ensure a common standard.Each examiner gave standardised instructions before starting each test and help was provided by the examiner whenever the participant had any questions.The testing session included 4 speaking tasks and 14 EF tests.Acquired measures are provided in Fig. 1.Additional details for each of the tests are described below.

Speaking test battery
The speaking test battery included in this study consisted of a number of well-known tools commonly used to test verbal abilities as well as spontaneous speaking tasks that were used to elicit discourse that is as close to natural speech as possible.When selecting the tests, care was taken to ensure that they cover a spectrum of tests that range from formal speaking tests such as word generation and picture naming to less structured tests such as picture description and spontaneous speech.All speaking tasks were presented and automatically recorded using Presentation software (Neurobehavioral Systems, Inc.; Version 20.1, Build 12.04.17) on an HP ProBook 4730s and using a Logitech Stereo USB Headset as a microphone.The VF tasks used in this study were based on the Regensburger Wortflüssigkeitstest 39 which is equivalent to the English Controlled Oral Word Association Test 40 .The implementation used in this study comprised two types of VF: lexical VF, and semantic VF, each consisting of three separate sub-tasks.The lexical VF task consisted of two simple tasks where participants were required to generate as many German words as possible that start with the letters "M" and "K" respectively.The decision for the selection of the specific letters was based on the difficulty level associated with the search for words starting with the respective letter.While words with the initial letter "M" provide an abundant search space, the letter "K" represents a higher difficulty level due to less available words 39 .An additional, more demanding task involved a switching component where the participants were required to switch between words that start with the letter "G" and words that start with the letter "R".Each of the three tasks were performed for two minutes.Participants were not allowed to use proper nouns or repeat words more than once.Words with the same root were considered to be the same word and were thus also not allowed.Additionally, a word was only considered as correct if it would be found in a German book or newspaper.The instruction was given in German and was as follows: 'Bei dieser Aufgabe sollen Sie innerhalb von zwei Minuten möglichst viele verschiedene Wörter nennen, die mit dem Anfangsbuchstaben "M" beginnen.Dabei sollen Sie verschiedene Regeln beachten: Sie sollen nur Wörter nennen, die in einer deutschen Zeitung oder einem deutschen Buch verwendet werden könnten.Dabei sollen Sie keine Wörter mehrfach nennen.Die Wörter dürfen aber auch nicht mit dem gleichen Wortstamm beginnen, also "Müll, Mülleimer, Müllabfuhr, Mülltonne"/"Kerze, Kerzenschein, Kerzenständer, Kerzenlicht" gelten nur als ein Wort.Weiterhin dürfen Sie auch keine Eigennamen nennen, also "Miriam, Max, Madrid, Malta"/"Kerstin, Kurt, Köln, Kreta" gelten nicht.Bitte versuchen Sie, möglichst schnell viele verschiedene Wörter mit dem Anfangsbuchstaben "M/K" zu nennen.' 39 The semantic VF task consisted of two simple tasks where the participants were required to name animals and jobs respectively.The third task involved a switching component where the participants were required to switch between naming fruit and sports.Each of the three tasks were performed for two minutes and the rules specified in the lexical VF task still applied.The instruction was given in German and was as follows: 'Bei dieser Aufgabe sollen Sie innerhalb von zwei Minuten möglichst viele verschiedene Wörter aus der Kategorie "Tiere"/"Berufe" nennen.Dabei sollen Sie keine Tiere mehrfach nennen.Bitte versuchen Sie, möglichst schnell viele Tiere/Berufe zu nennen.' 39 Picture-word interference paradigm Participants were shown 64 different pictures (obtained from 41 ), each accompanied by a spoken word which was either semantically related or semantically unrelated to the picture shown.The pictures were shown for 4500 ms and were followed by a fixation cross that was shown for 3000 ms.The auditory stimuli were spoken by a 23-yearold German female.Participants were required to name the picture that was shown as quickly as possible, and their audio was recorded as soon as the picture faded in.The list of picture names and auditory distractors can be found in the data repository as a separate file.For target and feature selection, items were controlled for an unequal onset and distractor items were not used as target items.Moreover, items were controlled for frequency and semantic relatedness using GermaNet Pathfinder 42 .

Picture description task
Participants were shown the Cookie Theft Picture obtained from 43 , and asked to describe it in as much detail as possible in 90 s.The instruction was given in German and was as follows: "Bitte beschreiben Sie in 90 Sekunden dieses Bild so ausführlich wie möglich." The answer given by the participants was then recorded for 90 s.

Spontaneous speaking task
Participants were first told that they will be asked two questions to which they were required to reply in as much detail as possible for 5 min.This instruction was given in German and was as follows: "Ich werde Ihnen heute insgesamt zwei Fragen stellen, bei denen ich Sie bitte, etwas ausführlicher für ca. 5 min zu antworten." The first question required the participants to either describe a book that they have read recently or to talk about something that they watched on television the night before.In case participants could not respond to this, they were asked to report any events happening within the last weeks.The question was asked in German and was as follows: "Was haben Sie gestern Abend im Fernsehen geschaut oder welches Buch haben Sie gelesen?" The second question required the participants to describe a vacation that they would like to take if time and money were no object.The question was asked in German and was as follows: www.nature.com/scientificreports/"Wo und wie würden Sie Ihren schönsten Urlaub verbringen?Erzählen Sie uns etwas darüber." Both answers given were recorded for 5 min each and the examiners asked for more detail in the case of participants that did not talk for that long.

EF test battery
The EF test battery consisted of computerised versions of commonly used neuropsychological tests covering different subdomains of EFs either from the SCHUHFRIED Wiener Testsystem or Psytoolkit (https:// www.psyto olkit.org/ exper iment-libra ry/ mackw orth.html; 44,45 ).The tests included in the battery were chosen to capture a broad range of subdomains of cognitive performance such as cognitive flexibility, planning, working memory, attention, and inhibition.There is overlap in the general areas covered by the tests.However, each test has properties that make it unique compared to the other tests in the battery.Table 1 outlines the specific battery and version that was used for each test while Table 2 provides an overview of the different variables that were measured for each test together with descriptive statistics for each of the variables (mean, standard deviation and range).Values being shared represent raw data for each test.

Corsi block tapping test (CORSI)
Participants were presented with nine cubes arranged in an irregular order on the screen followed by a pointer that points to three cubes in a specific order.At the end of this sequence a signal sounded prompting the participants to repeat the given sequence.The length of the sequence was increased by one cube each time the participants completed the sequence successfully.

Response inhibition (INHIB)
The test consisted of two parts.In the first part of the test an arrow was displayed on the screen and participants were asked to respond to the direction in which the arrow was pointing.In the second part of the test the participants were asked to repeat the task as in the previous part but were additionally asked to suppress their motoric response whenever they heard an auditory signal.

Mackworth clock test (MACK)
Participants were presented with a large green clock hand displayed on a black screen.The hand moved like the second hand of a clock, approximately every second.At infrequent and irregular intervals, the hand made irregular "jumps".Participants were requested to detect and quickly react to these irregular "jumps" by pressing a button.The irregular "jump" of the clock hand was around 10% of the circle and the duration of the test was 1 min comprising 60 total moves of the clock hand.

N-back non-verbal test (NBN)
A sequence of 100 abstract figures were presented one by one.The task consisted of indicating whether the figure that was currently displayed was identical to the one shown two places back (2-back paradigm).If it was, the participant was expected to press a button as quickly as possible.

Non-verbal learning test (NVLT)
Nonsensical, irregular, and geometric figures were presented on the screen.During the course of the test some figures were shown multiple times.For each figure the participants were required to decide whether the current figure has already appeared or whether this figure is being shown for the first time.Participants were asked to press the "l" key if they read the word "rechts" (German word for right) and the "a" key if they read the word "links" (German word for left).Each word was displayed either on the right or the left part of the screen meaning that the stimulus could be congruent or incongruent to its position.

Ravens standard progressive matrices (SPM)
The participants were shown eight separate items that follow a pattern.The task required the participants to identify one missing item out of 6 choices to complete the pattern.The difficulty in pattern recognition increased during the course of the test.

Stroop interference test (STROOP)
Names of colours were displayed on the screen in a colour which was incongruent to the name (e.g., the word "blau" (German for blue) printed in red).The test consisted of two conditions.In the naming condition the participants were asked to respond to the colour of the words.In the reading condition participants were asked to respond to the meaning of the word.A baseline measure for the reaction speed and accuracy of the participants was established at the start of the test by presenting colour words without colouring or simple colour bars.

Cued task switching (SWITCH)
This task consisted of a shape and a colour task.A cue stimulus informed the participant which task to perform on every trial.The cue for the colour task was the word "COLOR" and the cure for the shape task was the word "SHAPE".In the colour task participants were asked to respond to the colour of the presented figure while ignoring the shape.In the shape task participants were required to respond to the shape of the presented figure while ignoring the colour.For selecting the respective colour or shape, two letters of the keyboard were determined (the letter "b" was used for the answers circle and yellow and the letter "n" was used for the answers rectangle and blue.Depending on the answer, the respective letter was pressed by the participant.

Trail making test (TMT)
The task consisted of 2 parts: part A and part B. In part A numbers ranging from 1 to 25 were displayed randomly across the screen.Participants were asked to click on the numbers in ascending order and as quickly as possible.
In part B, the numbers displayed on the screen ranged from 1 to 13 and were accompanied by alphabetic letters ranging from A to L, both of which were presented in a random order.Part B required participants to click on numbers and letters alternately and in ascending order.

Tower of London (TOL)
Participants were presented with an image that depicts a three-dimensional wooden model with three rods on which three balls of different colors are placed.The left rod holds three balls, the middle rod takes two balls, and the right rod has room for one ball.The participants were asked to move the balls from the starting state to a target position using a minimum number of moves.The target state was always shown in the upper part of the screen and the starting state in the lower part.The minimum number of moves required to achieve this was  shown to the left of the starting state.Various rules were to be observed, one of which was the rule that only one ball can be moved at a time.

Perception and attention functions test: divided attention (WAFG)
The participants were required to focus on two geometric figures and one auditory stimulus.At certain intervals the stimuli change their intensity (i.e., figure gets lighter and/or auditory stimulus gets louder).The participants were asked to respond when two stimuli became lighter/louder twice in succession.

Perception and attention functions test: spatial attention & neglect (C)
Four triangles were presented in four spatial positions.The participants were required to react if a triangle changes intensity (i.e., gets darker).In the neglect test an interfering or matching visual cue was also given.

Wisconsin card sorting test (WCST)
The task used here is not the actual Wisconsin Card Sorting Test, as copyrighted in the US, but rather a computerbased task that is inspired by the original test 46 .Four stimulus cards illustrating different geometrical figures were presented.The figures on the cards differ in number, colour, and form.The task of the participants was to figure out the classification rule to be able to match a newly presented card to one of the four cards.Participants were given feedback for every card that they matched.The classification rule was changed every 10 cards, requiring the participants to shift rules accordingly.

Additional data
In addition to the main set of speaking and EF tasks, phenotypical data were collected through questionnaires including the German version of the Beck Depression Inventory (BDI-II 47 ) used to collect information regarding depressive symptoms, and the NEO Five Factor Inventory (NEO-FFI 48 ).Furthermore, participants were asked general questions about their background, habits, and their physical and psychological well-being before commencement of the testing session.Saliva samples were collected at the beginning and at the end of the test session, stored in a refrigerator and sent to an external lab for analysis.The two saliva samples of each participant were then pooled at an external lab which carried out quantification analyses for cortisol, progesterone, oestradiol, and testosterone.

Data records/usage notes
The dataset presented in this paper is stored on GDPR-compliant and protected servers of the Forschungszentrum Jülich, housed at the Jülich Super Computing Centre, as agreed upon by the participants.The dataset complies with the four basic principles of FAIR.The dataset is clearly described with metadata, that are accessible on Jülich DATA (https:// data.fz-jueli ch.de/ datas et.xhtml?persi stent Id= doi:https:// doi.org/ 10. 26165/ JUELI CH-DATA/ CHWZDZ) making it findable, accessible, interoperable and reusable.
Researchers who wish to acquire access to the data are kindly asked to contact the authors at spexdata@ fz-juelich.de.Applicants will be asked to submit an approved ethics application together with a project outline.Additionally, applicants will be asked to ensure that the requested data will be only used for the research project specified and that it will not be passed on to third parties.Once the request is approved applicants will receive temporary access to an encrypted version of the data which they can then download.
The dataset repository contains 76.81 GB of data and includes five main folders, four of which contain the different measures depicted in Fig. 1 (i.e., EF data; speech data; hormone data; questionnaires).A fifth folder presents publications that have already made use of the dataset 32,33 .The folder containing the EF data contains a sub-folder for each of the different tests used (i.e., the 14 tests listed in Tables 1 and 2).Each sub-folder contains a comma-separated-value file with the corresponding raw data as well as a text file consisting of information about the specific measure, details on how it was acquired as well as details of the hardware and software used for the acquisition.On the other hand, the folder containing the speech data contains a sub-folder for each of the participants.Each of these sub-folders contains further sub-folders for each of the 6 speaking tasks, which in turn contain the corresponding raw waveform audio files in the uncompressed format RIFF WAVE (WAV).All the speech utterances were recorded with a bit rate of 2822 kBit/s, a sample size of 32 bit, and a sampling rate of 44.100 kHz.
For each of the 148 participants, 6 min of Lexical Verbal Fluency, 6 min of Semantic Verbal Fluency, 4.13 min of the Picture-Word Interference Paradigm, 1.5 min of the Picture Description Task, 5 min of Story Retelling, and 5 min of Story Generation were recorded.This corresponds to a total of 27.63 min of recorded speech data per subject.In total, the dataset provides 68.16 h of speech recordings.
We expect this dataset to be of interest to researchers conducting exploratory or hypothesis-driven research in the field of individual differences in language and executive functioning.The dataset has already been used to predict verbal fluency scores from EF performance 32 , and to predict EF performance from a comprehensive set of verbal fluency features 33 .
The data were collected at the Forschungszentrum Jülich in Jülich, Germany, between January and September 2018 in the context of a large-scale project aimed at investigating the relationship between speech and executive functioning.For each of the tests we provide the raw data output and the speech recordings.This data descriptor comprehensively describes the acquisition and curation of the dataset including the individual tests, experimental procedures and the folder structure of the data.The Data Records section describes how this data can be accessed.Researchers interested in performing exploratory and/or hypothesis-driven analyses in the field of language and cognitive performance are invited to make use of the collection presented here.

Table 1 .
A list of the EF tests performed by the participants.

Table 2 .
An overview of the different variables measured for each test.SOA = stimulus onset asynchrony.a Time measured in seconds.b Time measured in milliseconds.