TüEyeQ, a rich IQ test performance data set with eye movement, educational and socio-demographic information

Kasneci, Enkelejda; Kasneci, Gjergji; Appel, Tobias; Haug, Johannes; Wortha, Franz; Tibus, Maike; Trautwein, Ulrich; Gerjets, Peter

doi:10.1038/s41597-021-00938-3

Download PDF

Data Descriptor
Open access
Published: 16 June 2021

TüEyeQ, a rich IQ test performance data set with eye movement, educational and socio-demographic information

Enkelejda Kasneci¹,
Gjergji Kasneci²,
Tobias Appel ORCID: orcid.org/0000-0002-2596-2439^1,2,
Johannes Haug²,
Franz Wortha⁴,
Maike Tibus³,
Ulrich Trautwein ORCID: orcid.org/0000-0003-0647-0057³ &
…
Peter Gerjets⁴

Scientific Data volume 8, Article number: 154 (2021) Cite this article

3874 Accesses
7 Citations
9 Altmetric
Metrics details

Subjects

Education

This article has been updated

Abstract

We present the TüEyeQ data set - to the best of our knowledge - the most comprehensive data set generated on a culture fair intelligence test (CFT 20-R), i.e., an IQ Test, consisting of 56 single tasks, taken by 315 individuals aged between 18 and 30 years. In addition to socio-demographic and educational information, the data set also includes the eye movements of the individuals while taking the IQ test. Along with distributional information we also highlight the potential for predictive analysis on the TüEyeQ data set and report the most important covariates for predicting the performance of a participant on a given task along with their influence on the prediction.

Measurement(s)	intelligence • eye movement • Socioeconomic Factors
Technology Type(s)	culture fair intelligence test (CFT-R) • eye tracking device • Questionnaire
Sample Characteristic - Organism	Homo sapiens

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.14173244

Scaling neural machine translation to 200 languages

Article Open access 05 June 2024

Genome-wide association studies

Article 26 August 2021

Testing theory of mind in large language models and humans

Article Open access 20 May 2024

Background & Summary

For many decades, research in various fields has been devoted to the question of what constitutes human intelligence^1,2, the ways in which it develops in the course of our life³, and how it can be positively influenced⁴. Additionally, recent developments in the field of artificial intelligence are pushing methodological approaches towards the mimicking of human reasoning and problem solving strategies.

To support the research community and the work at the intersection of psychological and educational sciences and artificial intelligence, we provide the TüEyeQ data set (see Fig. 1). We collected a comprehensive data set from 315 university students performing a culture fair intelligence test (CFT 20-R). The CFT 20-R consists of 4 blocks, each with a fixed time limit and items that successively increase in difficulty. Along with the performance data, we provide socio-demographic and educational background information on the students as well as carefully annotated eye movement data of all participants during task solving. We believe that this data set will boost the research in various fields and will contribute to highly interesting research questions:

For psychology and psychometric research the question of how perception through eye movements and IQ (as measured by the CFT 20-R test) relate to each other can be thoroughly analysed (e.g.⁵). Also the question which tasks and how many of those tasks are needed to derive the respective conclusions can be explored.
For educational research, the relationship between an individual’s performance on a standardized IQ Test and socio-demographic and educational background can be further explored.
From a socio-economic perspective the question of whether the IQ-related performance of a designated and coherent group of people can be boosted by adequately assigning tasks and aggregating answers can be further analyzed (e.g.^6,7).
From a data science perspective the data set provides valuable means to analyse performance bias with respect to background information on the participants, such as education, training, viewing behaviour, gender and many more.
From the cognitive science perspective, this data set can make an important contribution to the study of strategy-related indicators in the context of (complex) problem solving (e.g.⁸). Additionally, TüEyeQ can support research on the relationship between eye movements and (fluid) intelligence (e.g.^9,10).
In the research field of human-computer interaction our data set can be used to explore various aspects of cognitive load and adaptation. More specifically, the user performance and behavioral data as described by the eye movements and pupillary measures can be employed to identify indicators of cognitive load and how they can be used to predict task difficulty under time constraints.
From the AI perspective the question of whether machine learning algorithms can learn to reason as humans and whether it is possible to develop an AI system that correctly solves such a test is among current challenges in AI research (e.g.^11,12,13).

Methods

Experimental design description

For TüEyeQ, 315 healthy participants (217, female, 94 male, 4 not stated; with an age mean of 23.272 years, SD 3.022) with a university entrance qualification, without neurological or psychiatric pre-existing conditions, and no visual impairment above 3 dioptres participated in our study. All participants underwent a large-scale study which aims at investigating the interaction between different partial abilities relevant for self-regulation in educational contexts. For this purpose, 321 participants were recruited. They performed different cognitive tasks (measuring, for instance, executive functioning or IQ) and filled in multiple questionnaires regarding self-regulation indicators (e.g., motivational beliefs or personality traits) in a lab setting during three sessions, each lasting up to 4 hours. Unfortunately, we had to exclude 4 participants because they did not complete all study sessions, and two more participants were excluded because of technical issues. For their participation in the study, the participants received remuneration of 8 EUR per hour and additionally 15 EUR in case they participated in all sessions. The participation in the study could be revoked at any time. All collected data were anonymized and treated confidentially.

In this work, we report specifically on performance data of this cohort of participants in an IQ test, their eye movement behavior during solving this IQ test and their socio-demographic and educational background characteristics, including information on software usage and leisure time activities. The questionnaire on socio-demographic and educational background of the participants was the first questionnaire of the first session of the study. The participants completed the CFT test immediately afterwards.

The Ethical Review Board of the Psychological Institute at the University of Tübingen approved the protocol of the study. All participants were informed in written form and consented that their anonymous data can be analyzed and published. Due to a self-constructed pseudonym, they had the option to revoke this consent at any time.

The CFT 20-R test

In our experiment, we employed the first part of the revised version of the culture fair intelligence test (CFT-R) designed by Weiß et al.¹⁴. This IQ-test is intended to measure the general mental capacity (i.e., the g-factor of intelligence or fluid intelligence) by means of tasks that require the ability to recognize figural relationships and to engage in formal logical thinking in problems of varying degrees of complexity under time restriction. Since the tasks administered are language-free, it is assumed that people with poor language knowledge and poor cultural techniques are not disadvantaged. The CFT 20-R consists of four blocks of different tasks, namely series continuation, classifications, matrices and topological conclusions. Each of these blocks has 11–15 items with increasing difficulty and a time limit of 3–4 minutes. Details and exemplary test items are presented in Fig. 2.

In order to record the eye movements of the participants during the task, we adapted the classic pen-and-paper version of the IQ test to a digital one that can be displayed on a computer screen. To be as close to the paper version as possible, we presented as many items as possible on a single screen page as long as this did not necessitate scrolling.

Participants first received general instructions about the nature of the test, followed by the first block. Each block had a specific instruction, introducing the participants to the requirements of this block and demonstrating the essence of the task based on 3 examples. The instruction phase was conducted without time constraints, thus all participants could go through the examples and make themselves familiar with the test procedure. All instructions were presented in German using the SoSci Survey online platform.

Data acquisition

Data collection took place in a digital classroom equipped with 30 remote eye trackers attached to laptops with 17inch HD display screens running at full brightness with a resolution of 1920 × 1080. This setup allows for data collection of up to 30 participants simultaneously, minimizing the overall time needed for collection. For this study, verbal instructions were given en masse pertaining to a brief overview of the protocol and an explanation of eye tracking, then individual calibrations were performed with a supervised quality check. Interactions between the participants and the computer took place via mouse or touch pad, depending on participants’ preference. The distance between participants and their respective screen was 50–70 cm depending on participants’ preferences.

The collection environment had the room illumination level controlled with no effects from sunlight or other outdoor light. The standard maintained illuminance for the experimental sessions was between 10 to 50 lux, measured with a Lux sensor (i.e., Gossen Mavo-Max illuminance sensor, MC Technologies, Hannover, Germany).

Eye movement data was collected by means of SMI RED250 remote eye trackers, a commercial eye tracker with 250 Hz sampling frequency. Since the eye tracker has a high sampling frequency, both stable (fixations) and rapid (saccadic) eye movements for static stimuli can be measured. Eye movements were recorded using the included eye-tracking software Experiment Center which outputs the raw gaze data consisting of x and y coordinates of each data point, the timestamp information and the pupil diameter in millimeters.

Calibration was performed for all participants using SMI’s built in 9-point calibration. A validation also was performed as a quality check to measure the gaze deviation for both eyes from a calibration point: A deviation larger than one degree required re-calibration. Calibrations were performed prior to the experiments as well as one or two times during the experimental session, depending on how many images were presented.

Data Preprocessing

Quality of eye tracking data

Initially, the raw gaze data was examined for signal quality using the eye-tracking software BeGaze provided along with the eye trackers. This software reports proportion of valid gaze signal to stimulus time as the tracking ratio. Therefore, if a participant’s tracking ratio was deemed insufficient (i.e., less than 80% for at least a part of the task), we omitted his or her data. This pre-processing stage can assure that errors (e.g. post-calibration shifts, poor signal due to glasses) in the gaze data are substantially minimized. Consequently, eye-tracking data of 58 participants had to be omitted due to low tracking ratios. Further 11 data sets were excluded due to errors in the presentation software and another 17 because of incomplete data. This leaves us with eye tracking data for only 229 of 315 participants. The raw eye-tracking data was then pre-processed to improve the data quality and to extract several features.

Fixations

Fixations are periods where the eye is stationary and a single location is gazed upon usually lasting between 200 ms and 350 ms¹⁵. Fixations not only contain information regarding what exact location participants look at, but also provide useful characteristics like frequency and duration. Longer fixations are associated with higher processing load and more effort^16,17. As a further consequence, frequency of fixations increases as their duration increases. Fixation information was extracted from the eye-tracking protocols based on the I-VT algorithm¹⁸ as implemented by the Perception’s Engineer Toolkit¹⁹. As a minimum fixation duration a threshold of 75 ms was chosen.

Saccades

Saccades are rapid eye movements that allow to change the focus of attention. As for fixations, saccades were extracted from the eye-tracking protocols based on the I-VT algorithm¹⁸ with a velocity threshold of 30°/s. Since the velocity of saccades is not voluntarily controlled²⁰, but depends on the neural activity, our data provides a powerful resource to study gaze parameters in cognitive engagement.

Microsaccades

Microsaccades are fixational eye movements, which occur during an especially prolonged fixation. They are similar to regular saccades, but smaller, jerk-like, and happen involuntarily. Their properties are linked to visual attention^21,22, perception²³, working memory²⁴, and task difficulty²⁵. To identify microsaccades we followed the procedure proposed by Krejtz et al.²⁶. The reported microsaccade-related features are comprised of the occurrence of microsaccades per fixation, their mean peak velocity, and their mean amplitude.

Pupil-related features

Pupil diameter has been used as an indicator of cognitive load for a very long time. Already in 1964, Hess and Polt reported that increasing task difficulty lead to increased pupil diameters²⁷. Since then, this finding has been replicated across many different domains including short-term memory, language processing, reasoning, perception, sustained attention and selective attention^28,29,30. Pupil dilation, however, is not only invoked by cognitive load³¹, but also emotional arousal³² or lightning conditions, fatigue^33,34, pain³⁵, and distance to a fixated object³⁶. A comprehensive overview of factors effecting the pupil is provided by Mathot³⁷.

To enable further research in the above mentioned areas based on our IQ-Data, we pre-processed the recorded pupil signal to enhance its quality and reduce the influence of artifacts and noise. For example, blink artifacts were removed by removing eye-tracking data in a time window 100 ms before and after a blink. During this period, the pupil may still be effected by the blink and recorded diameters are unlikely to accurately reflect any of the above mentioned effects (e.g., cognitive load, fatigue, etc). Furthermore, we removed pupil diameter values smaller than 0 (i.e., measurement errors) and replaced them by missing values. Finally, we interpolated small gaps of up to 50 ms (which corresponds to 12 data points at a sampling rate of 250 Hz) and smoothed the data using a third order low-pass Butterworth filter with a 2 Hz cutoff as proposed by³⁸.

From the processed pupil signal, we extract the mean which is provided as an additional feature along with fixation and saccade information.

Visual scanpath annotation

For each test item, we carefully annotated the visual search behaviour as the spatio-temporal sequence of fixations and saccades, i.e. the visual scanpath, during task solving. Figure 3 shows an exemplary test item as included within Block 1 of the CFT-R test. The participant has to find the best matching shape (i.e., target) among four distractors, which provides the best continuation of the given shape series. Semantic annotations in so-called Areas of Interest (AoIs) is visualized with a black box around specific parts of the stimulus. The figure further features an exemplary scanpath of a participant, where fixation locations are visualized by ellipses and saccades as arrows or vectors.

Figures 4 and 5 show the visual scanpaths of two participants solving the same task. The participant, whose visual scanpath is shown in Fig. 4, finds the solution (target) within very few fixations. Since the further options are not even visually explored, we can argue that the task was solved at high certainty. In contrast, the visual scanpath of the second participant, depicted in Fig. 5, reveals a visual exploration strategy which reflects a high uncertainty and anticipates a wrong solution to the given task. This example highlights the potential for further in-depth analysis of visual scanpaths and other eye movement measures related to (fluid) intelligence.

IQ task performance data

Performance in the CFT was dichotomized for each item, where one corresponds to a correct solution and zero to an incorrect solution. We decided to indicate items where no answer was selected as missing values because these missing values convey additional information (e.g., items that were skipped are identifiable trough this coding). Due to technical problems, six items in the CFT had issues in a distractor or target. Specifically, the eleventh item from the matrices and the third item from the topological conclusions block displayed an erroneous target during the experiment (i.e., the correct answer was not included in the possible options). The four other items (from the six with technical issues) included an inaccurate distractor. To reconcile these issues and maintain the validity of the overall task performance multiple steps were taken. First, all affected items were marked in the data set (suffix ‘e’). Second, the two items with erroneous targets were treated as missing, as participants were not able to solve them correctly. Third, sum scores for the items with inaccurate distractors, the regular items, and both, were compared through correlations. The analyses showed that performance on the items with inaccurate distractor and performance on the correctly displayed items (r(315) = 0.412, p < 0.001) as well as the overall sum scores with and without items with inaccurate distractors (r(315) = 0.992, p < 0.001) correlated significantly. This indicates that the items with an inaccurate distractor still measure performance that corresponds to overall task performance and are eligible for further analyses.

Socio-demographic Information

In addition to the above mentioned performance and eye movement features, we report socio-demographic and educational background information for each participant. The background information also includes leisure and gaming activities, software and internet usage, programming experience, and many more. A detailed description of these features along with their representation (i.e., encoding) in the data set is provided in Table 1.

Table 1 Description and encoding of all performance-related, educational and socio-demographic features in the order of their appearance in the csv file.

Full size table

To enable comparisons with related studies, we categorized the string variables “job_father”, “job_mother”, “leisure”, and “study_subjects” according to common taxonomies.

Specifically, we have categorized the leisure activities according to³⁹, who distinguish between 13 recreational activities. Since most of our participants take part in several leisure activities, we mapped the activities to a one-hot encoding scheme, which corresponds to all features with the prefix “leisure_[…]” in the data set.

The International Standard Classification of Education (ISCED) provides a framework of definitions that enable a comparison of education systems. We categorized our “study_subjects” according to the Fields of Education and Training⁴⁰ of ISCED.

Similarly, the International Standard Classification of Occupations (ISCO-08)⁴¹ defines different groups of occupations. However, we found that ISCO-08 does not adequately represent the range of occupations that was provided by our German participants. Specifically, ISCO-08 would have introduced a significant imbalance with respect to the “Professionals” category. Instead, we applied the taxonomy of stepstone.de (as of Nov. ’20), which is one of the most popular online job markets in Germany. This taxonomy led to a much more fine-grained and up-to-date categorization in “job_father” and “job_mother”.

Data Records

The TüEyeQ data set is available through the Harvard Dataverse Repository⁴².

The performance data along with the socio-demographic and education background information is provided as a single csv-file (cft-full) with 17,010 rows (observations) and 79 columns (features). For each of these features, Table 1 provides both the corresponding description and the encoding.

Eye movement information is provided in the folder (EyeMovementData) and comes in three formats. The first is a participant- and stimulus-wise raw format in folder (raw) that has only been pre-processed as described in the Section Data Preprocessing and consequently contains features for each fixation and saccade. Each subfolder here contains the raw eye-movement data for the participant with the participant ID being the same as the subfolder name. The therein contained files are named according to the corresponding stimulus (e.g., 1–3.csv means that this screen page contains the tasks 1 to 3 from the CFT test). Furthermore, a schematic version of each page of the CFT is provided in the folder (stimuli). For copyright reasons, this folder contains only the AOIs, the specific content of which can be retrieved from the CFT 20-R test¹⁴.

Mouse clicks of all participants are available in (clicks) – a csv file offering data about each click that was performed by a participant, describing thus, when and where a mouse click occurred, as well as on which part of what IQ task. This information is important for data analysis, since it details when a participant chose an answer and when he or she changed it.

In addition to the raw gaze data, we provide a convenience format stored in the folder (split), where the eye movement data is split up task-wise. This format features folders with participants’ IDs that hold task-wise data (e.g., task_07.csv for the seventh task of the CFT 20-R). For this convenience format, participants with substantial offsets in their fixations or noise that could indicate extensive head movements were removed after a visual inspection of the raw eye movement data. This accounted for the removal of 43 participants reducing the set to 186. Most of these 43 participants may be included after manual correction of the fixations at the reader’s discretion. The procedure for the assignment of events to tasks is described in short as follows:

Areas deemed relevant were the areas of each sub-task as well as the time remaining and the overall progress bar.
Any other area was considered task-irrelevant
Fixations on task-irrelevant AoIs were removed
Saccades that neither started nor ended on a relevant AoI were removed
Event outliers were removed by considering the 5 events before and after this event. If less than 30% if these events shared the same AoI, the fixation was considered an outlier.
All events on the AoIs of a task that happened after the participant had performed their final click for that task were removed.
Events involving either the progress bar or the timer were assigned to the task that had the most recent valid gaze.

For each participant, the eye movement information is provided as cvs files which contain the eye movement and pupil features. All coordinates have their origin in the top left corner of the screen and the x and y axis extend to the right and bottom respectively. The csv files contain the following features:
starttime: Timestamp information
duration: Duration of the event
meanPupilDiameter: Only for fixations, mean pupil diameter
eye: Left or right eye
eventType: Fixation or saccade
eventIdxLeft: Index of left eye event
eventIdxRight: Index of right eye event
meanX: Only for fixations, mean X coordinates
meanY: Only for fixations, mean Y coordinates
startSaccadeX: Only for saccades, X coordinates of saccade starting point
startSaccadeY: Only for saccades, Y coordinates of saccade starting point
endSaccadeX: Only for saccades, X coordinates of saccade ending point
endSaccadeY: Only for saccades, X coordinates of saccade ending point
microsaccadeCount: Only for fixations, number of microsaccades
microsaccadeAmplitide: Only for fixations, mean amplitude of microsaccades
microsaccadePeakVelocity: Only for fixations, mean peak velocity of microsaccades

Eye movements and mouse clicks share the same timestamp format, which makes them easy to synchronize for analysis. Furthermore, eye movements, AoIs, and mouse clicks share a common coordinate system allowing for convenient use of all three data sources.

Finally, we provide aggregated eye-tracking data for each participant and task that is already included in the ready-to-use format. This data is contained in the file named eye_features_aggregated.csv and consists of mean values for each participant and task. For each participant and task, this file provides mean values of fixation duration, saccade count, saccade amplitude, saccade duration, microssacade count, microsaccade amplitude, microsaccade duration, microsaccade peak velocity, and pupil diameter.

Technical Validation

Bias considerations

As Fig. 6 shows, there is neither an age-related nor a gender-related bias with respect to the overall performance over all tasks. More specifically, there is no significant difference in performance between male and female participants across the different age groups (right plot in Fig. 6). The four participants whose gender was unknown were omitted in this analysis.

Distribution of the aggregated CFT score

Also, as shown in Fig. 7, the overall performance of the participants – encoded by the variable cft_sum_full – is, as expected, approximately normally distributed. As indicated by the distribution, there are 7 participants (with participant IDs: ‘AAB14’, ‘ACB13’, ‘ATT30’, ‘BUO15’, ‘OAK22’, ‘SKA02’, ‘VWK01’) – represented by the two bars on the very left-hand side of the distribution plot - who show an unexpectedly low overall performance. We hypothesize that these participants did not take the experiment seriously or simply went through the tasks without thinking carefully about the possible solutions. Many of the CFT tasks were left unsolved by these participants.

Performance distribution over the CFT tasks in the order of their appearance

Figure 8 shows a histogram over all solved tasks in the order of their appearance in the CFT 20-R. Moreover, for every task, we show the incremental expected number of participants who can correctly solve a given task from the ones seen so far. We see that while the incremental mean decreases, the number of tasks that were skipped by the participants increases, especially within CFT blocks. This is in accordance with the intuition that CFT tasks are ordered (especially within the CFT blocks) by increasing degree of difficulty.

Discriminative information for predicting a participant’s performance on a given CFT task

In order to show that our data set does indeed contain discriminative information with respect to a participant’s performance on a given CFT task, we performed a binary classification on cft_task using the socio-demographic features. To this end, we ignored all observations with missing values (1,248 entries), i.e. all tasks, where no answer was provided. In general, however, these missing values carry semantic meaning and may be worth further investigating. Besides, we removed the feature participant because it represents an identifier and as such it inadequately increases the dimensionality of the feature vectors. The variable task_id, however, despite representing an identifier, reflects the order in which the CFT tasks occur and thus encodes the difficulty of tasks. Since CFT tasks are ordered by increasing order of difficulty within each CFT block and across blocks, we expect task_id to be a highly discriminative feature with respect to the performance of a participant on a given CFT task. We also removed cft_sum_full, which is an aggregation of the target and could thus cause information leakage.

We applied the following pre-processing steps:

We factorized the string features (i.e. we mapped them to integers).
We imputed the NaN-values in all categorical features with a new category.
We imputed the NaN-values in all continuous features with their median.
Additionally, we normalized all continuous features to the interval [0,1], by using the MinMaxScaler of scikit-learn. The normalization was required for the Logistic Regression experiment, since we applied an \(\ell \)2 regularization.

We randomly split the data into a training (80%) and test set (20%) and trained a Logistic Regression model on TüEyeQ. We specified an \(\ell \)2 regularization and a maximum of 1,000 iterations to train the Logistic Regression model.

To illustrate the importance of input features for the prediction (see Figs. 9 and 10), we computed Shapley values according to⁴³ and Lime values according to TabularLime⁴⁴. Specifically, we approximated Shapley values with LinearSHAP⁴³ for the Logistic Regression model. Furthermore, we also report the coefficients of the Logistic Regression model. Figure 10 and 9 (right) show that there tends to be an agreement between Shapely values, the Lime attributions and the Logistic Regression coefficients with respect to the variable influences on the scores produced by the Logistic Regression model. More specifically, variables like task_id (which implicitly encodes the task difficulty) and grades_math along with other grades and variables related to gaming, online and leisure activities are assigned by all approaches high absolute scores, indicating a high importance of these aspects for the prediction model. Especially the importance of the gaming-related variables in the model is in alignment with recent research investigating the relationship between gaming and fluid intelligence (e.g.⁴⁵), thus providing through TüEyeQ new means for further analysis of such relationships in a thorough way including eye movement behavior.

Variables with highest correlation to the aggregated CFT score

Interestingly, as shown in Table 2, the two variables with the strongest Distance Correlation⁴⁶ to the aggregated CFT score, cft_sum_full, are grades_math and mean_grade_degree. This is in alignment with the importance attribution results discussed above, where these two variables were assigned a high importance for the Logistic Regression model. However, in addition, we also see that other variables like grades_chemistry or grades_physics, which were not regarded as important by the previous attribution schemes, are among the variables with the highest correlation to cft_sum_full. These results together with those discussed in the previous paragraph show that the analysis of the variables’ impact on the performance in an IQ test can be quite intricate and needs to involve different measures and cover various aspects.

Table 2 Variables with the highest Distance Correlation⁴⁶ to the aggregated CFT score (cft_sum_full).

Full size table

Correlation between variables

The correlation scores in Table 3 correspond to the pairwise Distance Correlation coefficient for a random sample of 20% of the observations. Note that the Distance Correlation measures both linear and non-linear relationships between two random vectors. As depicted in Fig. 11, the pairwise correlations are centered around 0.1. The peaked distribution indicates that most of the variable pairs are uncorrelated or very weakly correlated. As shown in Table 3, only 54 out of 3,081 variable pairs have a Distance Correlation above 0.6 (and only 62 pairs have a correlation above 0.5). For the Logistic Regression model described above, we did not remove highly correlated variables, which might entail collinearity issues. Hence, we believe that better predictability could be achieved if more effort was invested in the data analysis and preprocessing steps.

Table 3 All pairs of variables with a Distance Correlation⁴⁶ above 0.6.

Full size table

Usage Notes

The data can be downloaded from https://doi.org/10.7910/DVN/JGOCKI as a csv file. All experiments are provided as a Jupyter notebook and may therefore be easily reproduced.

When using the TüEyeQ dataset or parts of it, please cite this manuscript.

Besides the data provided in this work, the authors can provide access to further variables and performance data of the participants on other tasks upon request. Such data comprises for example information on the “Big Five” personality factors, on internal and external locus of control, self-efficacy, attribution styles, academic self-concepts, domain-specific interests, motivational strategies, process and questionnaire data on learning success, learning processes and learning prerequisites as well as performance data for test items measuring executive control functions such as, flexibility, updating, inhibition (n-back, operation and reading Span items, Stroop item performace, Stop Signal Test, Trail Making Test). In this case, a sub-selection of the dataset, whose size will be defined based on the specific requests from interested users and on the processing time needed will be provided. For any questions, suggestions or request of collaboration regarding TüEyeQ please contact the corresponding author. This data set is freely available under the CC0 license.

Code availability

All Python-code corresponding to the evaluations described in this work is distributed on GitHub under the MIT license https://github.com/haugjo/TueEyeQ. To run the evaluation, the following packages are required (note that older or more recent versions might also work):

• python (v3.7.3)

• numpy (v1.18.1)

• pandas (v0.25.1)

• scikit-learn (v0.21.3)

• matplotlib (v3.1.3)

• shap (v0.34.0)

• lime (v0.2.0.1)

• dcor (v0.5.2)

• pickleshare (v0.7.5, only required to load the precomputed Distance Correlation scores).

Change history

26 October 2021
The Funding information section was missing from this article and should have read ‘Open Access funding enabled and organized by Projekt DEAL’. The original article has been corrected.

References

Engle, R. W., Tuholski, S. W., Laughlin, J. E. & Conway, A. R. Working memory, short-term memory, and general fluid intelligence: a latent-variable approach. Journal of experimental psychology: General 128, 309 (1999).
Article Google Scholar
Gray, J. R., Chabris, C. F. & Braver, T. S. Neural mechanisms of general fluid intelligence. Nature neuroscience 6, 316–322 (2003).
Article CAS Google Scholar
Fry, A. F. & Hale, S. Processing speed, working memory, and fluid intelligence: Evidence for a developmental cascade. Psychological science 7, 237–241 (1996).
Article Google Scholar
Jaeggi, S. M., Buschkuehl, M., Jonides, J. & Perrig, W. J. Improving fluid intelligence with training on working memory. Proceedings of the National Academy of Sciences 105, 6829–6833 (2008).
Article ADS CAS Google Scholar
Di Fabio, A. & Saklofske, D. H. Comparing ability and self-report trait emotional intelligence, fluid intelligence, and personality traits in career decision. Personality and Individual Differences 64, 174–178 (2014).
Article Google Scholar
Bachrach, Y., Graepel, T., Kasneci, G., Kosinski, M. & Van Gael, J. Crowd iq: aggregating opinions to boost performance. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, 535–542 (2012).
Google Scholar
Kosinski, M., Bachrach, Y., Kasneci, G., Van-Gael, J. & Graepel, T. Crowd iq: Measuring the intelligence of crowdsourcing platforms. In Proceedings of the 4th annual ACM web science conference, 151–160 (2012).
Stankov, L. Complexity, metacognition, and fluid intelligence. Intelligence 28, 121–143 (2000).
Article Google Scholar
Wilson, P., Papageorgiou, K. A. & Cooper, C. Speed of saccadic responses and intelligence: An exponential-gaussian analysis. Personality and Individual Differences 158, 109860 (2020).
Article Google Scholar
Sargezeh, B. A., Ayatollahi, A. & Daliri, M. R. Investigation of eye movement pattern parameters of individuals with different fluid intelligence. Experimental brain research 237, 15–28 (2019).
Article Google Scholar
Zhang, C., Gao, F., Jia, B., Zhu, Y. & Zhu, S.-C. RAVEN: A Dataset for Relational and Analogical Visual REasoNing. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 5312–5322 (2019).
Zheng, K., Zha, Z. & Wei, W. Abstract reasoning with distracting features. In Proceedings of the 32nd Advances in Neural Information Processing Systems (2019).
Zhuo, T. & Kankanhalli, M. Solving Raven’s Progressive Matrices with Neural Networks (2020). Preprint at https://arxiv.org/abs/2002.01646.
Weiß, R. CFT 20-R.: Grundintelligenztest Skala 2. Manual (Göttingen: Hogrefe Verlag, 2006).
Rayner, K. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin 124, 372 (1998).
Article CAS Google Scholar
Inamdar, S. & Pomplun, M. Comparative search reveals the tradeoff between eye movements and working memory use in visual tasks. In Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 25 (2003).
He, J. & McCarley, J. S. Executive working memory load does not compromise perceptual processing during visual search: Evidence from additive factors analysis. Attention, Perception, & Psychophysics 72, 308–316 (2010).
Article Google Scholar
Salvucci, D. D. & Goldberg, J. H. Identifying fixations and saccades in eye-tracking protocols. In Proceedings of the 2000 ACM Symposium on Eye tracking Research and Applications, 71–78 (2000).
Kübler, T. C. The perception engineer’s toolkit for eye-tracking data analysis. In Proceedings of the 2020 ACM Symposium on Eye Tracking Research and Applications, 1–4 (2020).
Leigh, R. J. & Zee, D. S. The neurology of eye movements (OUP USA, 2015).
Engbert, R. & Kliegl, R. Microsaccades uncover the orientation of covert attention. Vision research 43, 1035–1045 (2003).
Article Google Scholar
Hafed, Z. M. & Clark, J. J. Microsaccades as an overt measure of covert attention shifts. Vision research 42, 2533–2545 (2002).
Article Google Scholar
Troncoso, X. G., Macknik, S. L. & Martinez-Conde, S. Microsaccades counteract perceptual filling-in. Journal of vision 8, 15–15 (2008).
Article Google Scholar
Valsecchi, M. & Turatto, M. Microsaccadic responses in a bimodal oddball task. Psychological research 73, 23–33 (2009).
Article Google Scholar
Siegenthaler, E. et al. Task difficulty in mental arithmetic affects microsaccadic rates and magnitudes. European Journal of Neuroscience 39, 287–294 (2014).
Article Google Scholar
Krejtz, K., Duchowski, A. T., Niedzielska, A., Biele, C. & Krejtz, I. Eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze. PLOS ONE 13, 1–23 (2018).
Article Google Scholar
Hess, E. H. & Polt, J. M. Pupil size in relation to mental activity during simple problem-solving. Science 143, 1190–1192 (1964).
Article ADS CAS Google Scholar
Scharinger, C., Kammerer, Y. & Gerjets, P. Pupil dilation and eeg alpha frequency band power reveal load on executive functions for link-selection processes during text reading. PloS one 10, e0130608 (2015).
Article Google Scholar
Lang, F., Kammerer, Y., Oschatz, K., Stürmer, K. & Gerjets, P. The role of beliefs regarding the uncertainty of knowledge and mental effort as indicated by pupil dilation in evaluating scientific controversies. International Journal of Science Education 42, 350–371 (2020).
Article ADS Google Scholar
Scharinger, C., Soutschek, A., Schubert, T. & Gerjets, P. When flanker meets the n-back: What eeg and pupil dilation data reveal about the interplay between the two central-executive working memory functions inhibition and updating. Psychophysiology 52, 1293–1304 (2015).
Article Google Scholar
Appel, T. et al. Predicting cognitive load in an emergency simulation based on behavioral and physiological measures. In Proceedings of the 2019 ACM International Conference on Multimodal Interaction, 154–163 (2019).
Henderson, R. R., Bradley, M. M. & Lang, P. J. Emotional imagery and pupil diameter. Psychophysiology 55, e13050 (2018).
Article Google Scholar
Hopstaken, J. F., van der Linden, D., Bakker, A. B. & Kompier, M. A. The window of my eyes: Task disengagement and mental fatigue covary with pupil dynamics. Biological Psychology 110, 100–106 (2015).
Article Google Scholar
Porta, M., Ricotti, S. & Perez, C. J. Emotional e-learning through eye tracking. In Proceedings of the 2012 IEEE Global Engineering Education Conference (EDUCON), 1–6 (2012).
Wildemeersch, D., Peeters, N., Saldien, V., Vercauteren, M. & Hans, G. Pain assessment by pupil dilation reflex in response to noxious stimulation in anaesthetized adults. Acta Anaesthesiologica Scandinavica 62, 1050–1056 (2018).
Article CAS Google Scholar
Charman, W. & Whitefoot, H. Pupil diameter and the depth-of-field of the human eye as measured by laser speckle. Optica Acta: International Journal of Optics 24, 1211–1216 (1977).
Article Google Scholar
Mathôt, S. Pupillometry: Psychology, physiology, and function. Journal of Cognition 1 (2018).
Castner, N. et al. Pupil diameter differentiates expertise in dental radiography visual search. PloS one 15, e0223941 (2020).
Article CAS Google Scholar
Leitner, M. J. & Leitner, S. F. Leisure enhancement (Haworth Press, 2004).
for Statistics, U. I. International standard classification of education fields of education and training 2013 (isced-f 2013): Detailed field descriptions. (2015).
Office, I. L. International Standard Classification of Occupations 2008 (ISCO-08): structure, group definitions and correspondence tables (International Labour Office, 2012).
Kasneci, E. et al. TüEyeQ, a rich IQ test performance data set with eye movement, educational and socio-demographic information Harvard Dataverse https://doi.org/10.7910/DVN/JGOCKI (2020).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 4765–4774 (2017).
Ribeiro, M. T., Singh, S. & Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 1135–1144 (2016).
Kokkinakis, A. V., Cowling, P. I., Drachen, A. & Wade, A. R. Exploring the relationship between video game expertise and fluid intelligence. PLOS ONE 12, 1–15 (2017).
Article Google Scholar
Székely, G. J., Rizzo, M. L. & Bakirov, N. K. Measuring and testing dependence by correlation of distances. The annals of statistics 35, 2769–2794 (2007).
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported as part of the LEAD Graduate School & Research Network [GSC1028], which was funded within the framework of the Excellence Initiative of the German federal and state governments. We acknowledge support by the Open Access Publishing Fund of University of Tübingen. Enkelejda Kasneci is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 – Project number 390727645.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Human-Computer Interaction, Department of Computer Science, University of Tübingen, Tübingen, Germany
Enkelejda Kasneci & Tobias Appel
Data Science and Analytics, Department of Computer Science, University of Tübingen, Tübingen, Germany
Gjergji Kasneci, Tobias Appel & Johannes Haug
Hector Research Institute of Education Sciences and Psychology, University of Tübingen, Tübingen, Germany
Maike Tibus & Ulrich Trautwein
Leibniz-Institut für Wissensmedien, Tübingen, Germany
Franz Wortha & Peter Gerjets

Authors

Enkelejda Kasneci
View author publications
You can also search for this author in PubMed Google Scholar
Gjergji Kasneci
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Appel
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Haug
View author publications
You can also search for this author in PubMed Google Scholar
Franz Wortha
View author publications
You can also search for this author in PubMed Google Scholar
Maike Tibus
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Trautwein
View author publications
You can also search for this author in PubMed Google Scholar
Peter Gerjets
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.K., T.A., M.T., U.T. and P.G. conceived the experiment, T.A., F.W. and M.T. conducted the experiment, E.K., T.A., G.K., J.H. and F.W. prepared the data and analysed the results. All authors reviewed the manuscript.

Corresponding author

Correspondence to Enkelejda Kasneci.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Kasneci, E., Kasneci, G., Appel, T. et al. TüEyeQ, a rich IQ test performance data set with eye movement, educational and socio-demographic information. Sci Data 8, 154 (2021). https://doi.org/10.1038/s41597-021-00938-3

Download citation

Received: 01 December 2020
Accepted: 11 May 2021
Published: 16 June 2021
DOI: https://doi.org/10.1038/s41597-021-00938-3

Subjects

Abstract

Similar content being viewed by others

Scaling neural machine translation to 200 languages

Genome-wide association studies

Testing theory of mind in large language models and humans

Background & Summary

Methods

Experimental design description

The CFT 20-R test

Data acquisition

Data Preprocessing

Quality of eye tracking data

Fixations

Saccades

Microsaccades

Pupil-related features

Visual scanpath annotation

IQ task performance data

Socio-demographic Information

Data Records

Technical Validation

Bias considerations

Distribution of the aggregated CFT score

Performance distribution over the CFT tasks in the order of their appearance

Discriminative information for predicting a participant’s performance on a given CFT task

Variables with highest correlation to the aggregated CFT score

Correlation between variables

Usage Notes

Code availability

Change history

26 October 2021

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links