Abstract
Viewing behavior provides a window into many central aspects of human cognition and health, and it is an important variable of interest or confound in many functional magnetic resonance imaging (fMRI) studies. To make eye tracking freely and widely available for MRI research, we developed DeepMReye, a convolutional neural network (CNN) that decodes gaze position from the magnetic resonance signal of the eyeballs. It performs cameraless eye tracking at subimaging temporal resolution in held-out participants with little training data and across a broad range of scanning protocols. Critically, it works even in existing datasets and when the eyes are closed. Decoded eye movements explain network-wide brain activity also in regions not associated with oculomotor function. This work emphasizes the importance of eye tracking for the interpretation of fMRI results and provides an open source software solution that is widely applicable in research and clinical settings.
This is a preview of subscription content
Access options
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
$29.99
monthly
Subscribe to Journal
Get full journal access for 1 year
$59.00
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Buy article
Get time limited or full article access on ReadCube.
$32.00
All prices are NET prices.




Data availability
We analyzed data of multiple previous reports, which can be requested from the respective authors. Dataset 1 is part of a larger data-sharing initiative and can be downloaded at http://fcon_1000.projects.nitrc.org. We further share online on Open Science Framework (https://doi.org/10.17605/OSF.IO/MRHK9) exemplary data to illustrate our pipeline (also see ‘Code availability’ statement) as well as the source data for Figs. 1–4 and Extended Data Figs. 1–10. Moreover, we share pretrained model weights estimated on the datasets used in the present work. These model weights allow for decoding of viewing behavior without retraining the model in certain scenarios (see online documentation for more details at https://github.com/DeepMReye). Source data are provided with this paper.
Code availability
The DeepMReye model code can be found on GitHub (https://github.com/DeepMReye), along with user documentation and a frequently asked questions page. Moreover, we share Colab notebooks that illustrate the use of DeepMReye using exemplary data. Finally, we share exemplary eye-tracking calibration scripts that can be easily adapted to acquire training data for DeepMReye.
References
Anderson, T. J. & MacAskill, M. R. Eye movements in patients with neurodegenerative disorders. Nat. Rev. Neurol. 9, 74–85 (2013).
Morrone, M. C., Ross, J. & Burr, D. Saccadic eye movements cause compression of time as well as space. Nat. Neurosci. 8, 950–954 (2005).
Berman, R. A. et al. Cortical networks subserving pursuit and saccadic eye movements in humans: an fMRI study. Hum. Brain Mapp. 8, 209–225 (1999).
Petit, L. & Haxby, J. V. Functional anatomy of pursuit eye movements in humans as revealed by fMRI. J. Neurophysiol. 82, 463–471 (1999).
McNabb, C. B. et al. Inter-slice leakage and intra-slice aliasing in simultaneous multi-slice echo-planar images. Brain Struct. Funct. 225, 1153–1158 (2020).
Voss, J. L., Bridge, D. J., Cohen, N. J. & Walker, J. A. A closer look at the hippocampus and memory. Trends Cogn. Sci. 21, 577–588 (2017).
Tregellas, J. R., Tanabe, J. L., Miller, D. E. & Freedman, R. Monitoring eye movements during fMRI tasks with echo planar images. Hum. Brain Mapp. 17, 237–243 (2002).
Beauchamp, M. S. Detection of eye movements from fMRI data. Magn. Reson. Med. 49, 376–380 (2003).
Heberlein, K., Hu, X., Peltier, S. & LaConte, S. Predictive eye estimation regression (PEER) for simultaneous eye tracking and fMRI. In Proc. 14th Scientific Meeting, International Society for Magnetic Resonance in Medicine 14, 2808 (2006).
Son, J. et al. Evaluating fMRI-based estimation of eye gaze during naturalistic viewing. Cereb. Cortex 30, 1171–1184 (2020).
Alexander, L. M. et al. An open resource for transdiagnostic research in pediatric mental health and learning disorders. Sci. Data 4, 170181 (2017).
Nau, M., Schindler, A. & Bartels, A. Real-motion signals in human early visual cortex. Neuroimage 175, 379–387 (2018).
Polti, I., Nau, M., Kaplan, R., van Wassenhove, V. & Doeller, C. F. Hippocampus and striatum encode distinct task regularities that guide human timing behavior. Preprint at bioRxiv https://doi.org/10.1101/2021.08.03.454928 (2021).
Nau, M., Navarro Schröder, T., Bellmund, J. L. & Doeller, C. F. Hexadirectional coding of visual space in human entorhinal cortex. Nat. Neurosci. 21, 188–190 (2018).
Julian, J. B., Keinath, A. T., Frazzetta, G. & Epstein, R. A. Human entorhinal cortex represents visual space using a boundary-anchored grid. Nat. Neurosci. 21, 191–194 (2018).
Ehinger, K. A., Hidalgo-Sotelo, B., Torralba, A. & Oliva, A. Modelling search for people in 900 scenes: a combined source model of eye guidance. Vis. Cogn. 17, 945–978 (2009).
Wolfe, J. M. Visual search: how do we find what we are looking for? Annu. Rev. Vis. Sci. 6, 539–562 (2020).
Hebart, M. N. et al. THINGS: a database of 1,854 object concepts and more than 26,000 naturalistic object images. PLoS ONE 14, e0223792 (2019).
Duchowski, A. T Eye Tracking Methodology: Theory and Practice 3rd edn (Springer International Publishing, 2017).
Brodoehl, S., Witte, O. W. & Klingner, C. M. Measuring eye states in functional MRI. BMC Neurosci. 17, 48 (2016).
Coiner, B. et al. Functional neuroanatomy of the human eye movement network: a review and atlas. Brain Struct. Funct. 224, 2603–2617 (2019).
Keck, I. R., Fischer, V., Puntonet, C. G. & Lang, E. W. Eye Movement Quantification in Functional MRI Data by Spatial Independent Component Analysis. In International Conference on Independent Component Analysis and Signal Separation Vol. 5441 (eds Adali, T., Jutten, C., Romano, J. M. T. & Barros, A. K.) 435-442 (Springer Berlin Heidelberg, 2009).
Franceschiello, B. et al. 3-Dimensional magnetic resonance imaging of the freely moving human eye. Prog. Neurobiol. 194, 101885 (2020).
LaConte, S. M. & Glielmi, C. B. Verifying visual fixation to improve fMRI with predictive eye estimation regression (PEER). In Proc. 15th Scientific Meeting, International Society for Magnetic Resonance in Medicine, Berlin 3438 (2007).
Sathian, K. et al. Dual pathways for haptic and visual perception of spatial and texture information. Neuroimage 57, 462–475 (2011).
O’Connell, T. P. & Chun, M. M. Predicting eye movement patterns from fMRI responses to natural scenes. Nat. Commun. 9, 5159 (2018).
Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2019).
Tagliazucchi, E. & Laufs, H. Decoding wakefulness levels from typical fMRI resting-state data reveals reliable drifts between wakefulness and sleep. Neuron 82, 695–708 (2014).
Naselaris, T., Kay, K. N., Nishimoto, S. & Gallant, J. L. Encoding and decoding in fMRI. Neuroimage 56, 400–410 (2011).
Kriegeskorte, N. & Douglas, P. K. Interpreting encoding and decoding models. Curr.Opin. Neurobiol. 55, 167–179 (2019).
Sonkusare, S., Breakspear, M. & Guo, C. Naturalistic stimuli in neuroscience: critically acclaimed. Trends Cogn. Sci. 23, 699–714 (2019).
Lim, S.-L., O’Doherty, J. P. & Rangel, A. The decision value computations in the vmPFC and striatum use a relative value code that is guided by visual attention. J. Neurosci. 31, 13214–13223 (2011).
Koba, C., Notaro, G., Tamm, S., Nilsonne, G. & Hasson, U. Spontaneous eye movements during eyes-open rest reduce resting-state-network modularity by increasing visual-sensorimotor connectivity. Netw. Neurosci. 5, 451–476 (2021).
Murphy, K., Birn, R. M. & Bandettini, P. A. Resting-state fMRI confounds and cleanup. Neuroimage 80, 349–359 (2013).
Frey, M. et al. Interpreting wide-band neural activity using convolutional neural networks. eLife 10, e66551 (2021).
Shen, D., Wu, G. & Suk, H. I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017).
Misra, D. Mish: a self regularized non-monotonic neural activation function. Preprint available at https://arxiv.org/abs/1908.08681 (2019).
Biewald, L. Experiment tracking with weights & biases. http://wandb.com/ (2020).
Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Acknowledgements
We thank I. Polti, J.B. Julian, R. Epstein and A. Bartels for providing imaging and eye-tracking data that were used in the present work. We further thank C.I. Baker and C. Barry for helpful discussions and J.B. Julian and C.I. Baker for comments on an earlier version of this manuscript. This work is supported by the European Research Council (ERC-CoG GEOCOG 724836). C.F.D.’s research is further supported by the Max Planck Society, the Kavli Foundation, the Centre of Excellence scheme of the Research Council of Norway, Centre for Neural Computation (223262/F50), The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits and the National Infrastructure scheme of the Research Council of Norway, NORBRAIN (197467/F50).
Author information
Authors and Affiliations
Contributions
M.F. and M.N. conceptualized the present work, developed the decoding pipeline and analyzed the data with input from C.F.D. M.F. wrote the key model implementation code with help from M.N. M.N. acquired most and analyzed all datasets, visualized the results and wrote the manuscript with help from M.F. M.F., M.N. and C.F.D. discussed the results and contributed to the manuscript. M.F. and M.N. share first authorship. M.N. and C.F.D. share senior authorship.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Predicted error (PE) correlates with the Euclidean error between real and predicted gaze positions.
This allows to filter the test set post-decoding based on estimated reliability. A) Results plotted for models trained and tested using the fixation target coordinates. B) Results plotted for models trained and tested using labels acquired using camera-based eye tracking. A, B) We plot single-participant data (dots) with regression line as well group-level Whisker-box-plots (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range). Participants were split into 80% most reliable (Low PE, blue) and 20% least reliable participants (high PE, orange). All scores expressed in visual degrees.
Extended Data Fig. 2 Quantifying gaze decoding in high predicted error and out-of-sample participants.
A) Gaze decoding group results expressed as the coefficient-of-determination (R2). Top panel shows gaze decoding expressed as the R2-score implemented in scikit-learn40 between the true and decoded gaze trajectory for the five key datasets featuring fixations, 3x smooth pursuit and visual search. Note that R2 can range from negative infinity to one. Participants are color coded according to predicted error (PE). We plot Whisker-box-plots for Low-PE participants (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data for all (dots). (B) Group-average spread of decoded positions around true positions collapsed over time in visual degrees for participants with high predicted error (orange dots in A).
Extended Data Fig. 3 Model evaluation across different decoding schemes.
A) Within-participant gaze decoding obtained by training and testing the model on different data partitions of all participants within a dataset. B) Across-dataset gaze decoding obtained using leave-one-data-set-out cross-validation. We plot the R2-score as implemented in scikit-learn40 between true and decoded gaze trajectory for the five key datasets featuring fixations, 3x smooth pursuit and visual search. Note that R2 can range from negative infinity to one. The results of datasets 1-3 were obtained using the fixation target labels, the ones of datasets 4-5 were obtained using camera-based eye tracking labels. Participants are color coded according to predicted error (PE). A, B) We plot Whisker-box-plots for Low-PE participants (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data for all (dots).
Extended Data Fig. 4 Model performance evaluated before and after exclusion of volumes with unreliable decoding.
Here, before computing model performance we filtered out either the 0%, 20% or 50% least reliable volumes (that is those with the highest predicted error (PE)). Model performance is expressed as the coefficient-of-determination R2-score implemented in scikit-learn40 between true and decoded gaze trajectory for the five key datasets featuring fixations, 3x smooth pursuit and visual search. Note that R2 can range from negative infinity to one. We plot single-participant data (dots) as well as the mean ± standard error of the mean (line plots). Participant dots were additionally color coded according to the participants’ PE.
Extended Data Fig. 5 Gaze decoding evaluated using camera-based eye tracking for smooth pursuit datasets 3-4.
Model performance expressed as the Pearson correlation between true and decoded gaze trajectory for the datasets with camera-based eye tracking. Because the visual search dataset 5 used labels obtained using camera-based eye tracking as well, we additionally plot the results obtained for this dataset again for the sake of completeness. Participants are color coded according to predicted error (PE). We plot Whisker-box-plots for Low-PE participants (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data for all (dots).
Extended Data Fig. 6 Normalized test error as a function of how many participants were used for model training plotted for three different viewing behaviors.
We plot single participant data (dots) as well as the across-participant average model performance (black lines). Error bars depict the standard error of the mean. Right panel shows the average across datasets.
Extended Data Fig. 7 Across-participant decoding performance as a function of how much single-participant data was considered for model training as well as of the number of participants in the training data (n=8 and n=20).
We plot the group-level mean (line plots) ± standard error of the mean (error bars) of the model performance expressed as the Pearson correlation and the R-squared score between real and predicted gaze path in the test set. For free viewing, model performance saturates at as little as 5-10 Minutes of training data. Note that these results likely depend on the viewing behavior and on how similar the behavior is across data partitions and participants.
Extended Data Fig. 8 Sub-imaging decoding resolution.
A) Group results when all 10 sub-TR samples are considered for computing the Pearson correlation between true and decoded gaze trajectories. Participants are color coded according to predicted error (PE). We plot Whisker-box-plots for Low-PE participants (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data for all (dots). B) Similar standard deviation of real and decoded gaze labels within each functional volume (TR), that is if the 10 real gaze labels of a TR had a high standard deviation (indicating larger eye movements within this TR) then the 10 decoded gaze labels showed a high standard deviation as well. We plot the Pearson correlation between the within-TR standard deviation computed using the full time course of each participant as Whisker-box-plots (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data as dots. C, D) Single-participant examples of gaze decoding at a virtual sub-imaging resolution of 10 samples per volume. We plot three example participants with low predicted error (C) and three example participants with high predicted error (D) for the fixation, smooth pursuit and free-viewing datasets11,14,15. Functional-volume onsets plotted as grey vertical lines.
Extended Data Fig. 9 Eyes-open vs. eyes-closed across-participant decoding in smooth pursuit dataset 3.
A) Single-participant example of decoding the proportion of time spent eyes-closed. Note that model accuracy and hit rates were computed on binarized decoding labels, but that the model output is the actual proportion of time spent eyes-closed as shown on in this panel. B) Group-level accuracy for decoding whether the eyes were open or closed for more than 10% of the time it took to acquire the respective functional volume (left panel). We plot whisker-box-plots (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data (dots). We calculated balanced accuracy to rule out that the results reflect the model always classifying the most common label. In addition, we plot a Receiver operating characteristic (ROC) curve of the group-level data (right panel).
Extended Data Fig. 10 General-linear-model (GLM) group results for the contrast ‘Far vs. short eye movements’ during visual search without accounting for the hemodynamic response function.
We plot the F-statistic of this contrast superimposed on a template surface (fsaverage) for gaze-labels obtained with camera-based eye tracking (first panel) as well as for three DeepMReye cross-validation schemes. Within-participants: All participants of a dataset were included with different partitions in model training and test. Across-participants: Different participants were included during model training and test. Across-datasets: Different datasets (and hence also different participants) were included during model training and test.
Supplementary information
Supplementary Information
Supplementary Figs. 1–5 and Tables 1 and 2.
Source data
Source Data Fig. 1
Multivoxel pattern of the eyeballs.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Unthresholded statistical maps.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 9
Statistical source data.
Source Data Extended Data Fig. 10
Unthresholded statistical maps.
Rights and permissions
About this article
Cite this article
Frey, M., Nau, M. & Doeller, C.F. Magnetic resonance-based eye tracking using deep neural networks. Nat Neurosci 24, 1772–1779 (2021). https://doi.org/10.1038/s41593-021-00947-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41593-021-00947-w
Further reading
-
No camera needed with MR-based eye tracking
Nature Neuroscience (2021)