Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Magnetic resonance-based eye tracking using deep neural networks

Abstract

Viewing behavior provides a window into many central aspects of human cognition and health, and it is an important variable of interest or confound in many functional magnetic resonance imaging (fMRI) studies. To make eye tracking freely and widely available for MRI research, we developed DeepMReye, a convolutional neural network (CNN) that decodes gaze position from the magnetic resonance signal of the eyeballs. It performs cameraless eye tracking at subimaging temporal resolution in held-out participants with little training data and across a broad range of scanning protocols. Critically, it works even in existing datasets and when the eyes are closed. Decoded eye movements explain network-wide brain activity also in regions not associated with oculomotor function. This work emphasizes the importance of eye tracking for the interpretation of fMRI results and provides an open source software solution that is widely applicable in research and clinical settings.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Model architecture and input.
Fig. 2: Across-participant gaze decoding results.
Fig. 3: Effect of scan parameters and eye tracking while the eyes are closed.
Fig. 4: Decoded viewing behavior explains network-wide brain activity.

Data availability

We analyzed data of multiple previous reports, which can be requested from the respective authors. Dataset 1 is part of a larger data-sharing initiative and can be downloaded at http://fcon_1000.projects.nitrc.org. We further share online on Open Science Framework (https://doi.org/10.17605/OSF.IO/MRHK9) exemplary data to illustrate our pipeline (also see ‘Code availability’ statement) as well as the source data for Figs. 14 and Extended Data Figs. 110. Moreover, we share pretrained model weights estimated on the datasets used in the present work. These model weights allow for decoding of viewing behavior without retraining the model in certain scenarios (see online documentation for more details at https://github.com/DeepMReye). Source data are provided with this paper.

Code availability

The DeepMReye model code can be found on GitHub (https://github.com/DeepMReye), along with user documentation and a frequently asked questions page. Moreover, we share Colab notebooks that illustrate the use of DeepMReye using exemplary data. Finally, we share exemplary eye-tracking calibration scripts that can be easily adapted to acquire training data for DeepMReye.

References

  1. Anderson, T. J. & MacAskill, M. R. Eye movements in patients with neurodegenerative disorders. Nat. Rev. Neurol. 9, 74–85 (2013).

    Article  Google Scholar 

  2. Morrone, M. C., Ross, J. & Burr, D. Saccadic eye movements cause compression of time as well as space. Nat. Neurosci. 8, 950–954 (2005).

    CAS  Article  Google Scholar 

  3. Berman, R. A. et al. Cortical networks subserving pursuit and saccadic eye movements in humans: an fMRI study. Hum. Brain Mapp. 8, 209–225 (1999).

  4. Petit, L. & Haxby, J. V. Functional anatomy of pursuit eye movements in humans as revealed by fMRI. J. Neurophysiol. 82, 463–471 (1999).

    CAS  Article  Google Scholar 

  5. McNabb, C. B. et al. Inter-slice leakage and intra-slice aliasing in simultaneous multi-slice echo-planar images. Brain Struct. Funct. 225, 1153–1158 (2020).

    Article  Google Scholar 

  6. Voss, J. L., Bridge, D. J., Cohen, N. J. & Walker, J. A. A closer look at the hippocampus and memory. Trends Cogn. Sci. 21, 577–588 (2017).

    Article  Google Scholar 

  7. Tregellas, J. R., Tanabe, J. L., Miller, D. E. & Freedman, R. Monitoring eye movements during fMRI tasks with echo planar images. Hum. Brain Mapp. 17, 237–243 (2002).

    Article  Google Scholar 

  8. Beauchamp, M. S. Detection of eye movements from fMRI data. Magn. Reson. Med. 49, 376–380 (2003).

    Article  Google Scholar 

  9. Heberlein, K., Hu, X., Peltier, S. & LaConte, S. Predictive eye estimation regression (PEER) for simultaneous eye tracking and fMRI. In Proc. 14th Scientific Meeting, International Society for Magnetic Resonance in Medicine 14, 2808 (2006).

    Google Scholar 

  10. Son, J. et al. Evaluating fMRI-based estimation of eye gaze during naturalistic viewing. Cereb. Cortex 30, 1171–1184 (2020).

    Article  Google Scholar 

  11. Alexander, L. M. et al. An open resource for transdiagnostic research in pediatric mental health and learning disorders. Sci. Data 4, 170181 (2017).

  12. Nau, M., Schindler, A. & Bartels, A. Real-motion signals in human early visual cortex. Neuroimage 175, 379–387 (2018).

    Article  Google Scholar 

  13. Polti, I., Nau, M., Kaplan, R., van Wassenhove, V. & Doeller, C. F. Hippocampus and striatum encode distinct task regularities that guide human timing behavior. Preprint at bioRxiv https://doi.org/10.1101/2021.08.03.454928 (2021).

  14. Nau, M., Navarro Schröder, T., Bellmund, J. L. & Doeller, C. F. Hexadirectional coding of visual space in human entorhinal cortex. Nat. Neurosci. 21, 188–190 (2018).

    CAS  Article  Google Scholar 

  15. Julian, J. B., Keinath, A. T., Frazzetta, G. & Epstein, R. A. Human entorhinal cortex represents visual space using a boundary-anchored grid. Nat. Neurosci. 21, 191–194 (2018).

    CAS  Article  Google Scholar 

  16. Ehinger, K. A., Hidalgo-Sotelo, B., Torralba, A. & Oliva, A. Modelling search for people in 900 scenes: a combined source model of eye guidance. Vis. Cogn. 17, 945–978 (2009).

    Article  Google Scholar 

  17. Wolfe, J. M. Visual search: how do we find what we are looking for? Annu. Rev. Vis. Sci. 6, 539–562 (2020).

  18. Hebart, M. N. et al. THINGS: a database of 1,854 object concepts and more than 26,000 naturalistic object images. PLoS ONE 14, e0223792 (2019).

    CAS  Article  Google Scholar 

  19. Duchowski, A. T Eye Tracking Methodology: Theory and Practice 3rd edn (Springer International Publishing, 2017).

    Book  Google Scholar 

  20. Brodoehl, S., Witte, O. W. & Klingner, C. M. Measuring eye states in functional MRI. BMC Neurosci. 17, 48 (2016).

  21. Coiner, B. et al. Functional neuroanatomy of the human eye movement network: a review and atlas. Brain Struct. Funct. 224, 2603–2617 (2019).

    CAS  Article  Google Scholar 

  22. Keck, I. R., Fischer, V., Puntonet, C. G. & Lang, E. W. Eye Movement Quantification in Functional MRI Data by Spatial Independent Component Analysis. In International Conference on Independent Component Analysis and Signal Separation Vol. 5441 (eds Adali, T., Jutten, C., Romano, J. M. T. & Barros, A. K.) 435-442 (Springer Berlin Heidelberg, 2009).

  23. Franceschiello, B. et al. 3-Dimensional magnetic resonance imaging of the freely moving human eye. Prog. Neurobiol. 194, 101885 (2020).

  24. LaConte, S. M. & Glielmi, C. B. Verifying visual fixation to improve fMRI with predictive eye estimation regression (PEER). In Proc. 15th Scientific Meeting, International Society for Magnetic Resonance in Medicine, Berlin 3438 (2007).

  25. Sathian, K. et al. Dual pathways for haptic and visual perception of spatial and texture information. Neuroimage 57, 462–475 (2011).

    CAS  Article  Google Scholar 

  26. O’Connell, T. P. & Chun, M. M. Predicting eye movement patterns from fMRI responses to natural scenes. Nat. Commun. 9, 5159 (2018).

  27. Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2019).

    CAS  Article  Google Scholar 

  28. Tagliazucchi, E. & Laufs, H. Decoding wakefulness levels from typical fMRI resting-state data reveals reliable drifts between wakefulness and sleep. Neuron 82, 695–708 (2014).

    CAS  Article  Google Scholar 

  29. Naselaris, T., Kay, K. N., Nishimoto, S. & Gallant, J. L. Encoding and decoding in fMRI. Neuroimage 56, 400–410 (2011).

    Article  Google Scholar 

  30. Kriegeskorte, N. & Douglas, P. K. Interpreting encoding and decoding models. Curr.Opin. Neurobiol. 55, 167–179 (2019).

    CAS  Article  Google Scholar 

  31. Sonkusare, S., Breakspear, M. & Guo, C. Naturalistic stimuli in neuroscience: critically acclaimed. Trends Cogn. Sci. 23, 699–714 (2019).

    Article  Google Scholar 

  32. Lim, S.-L., O’Doherty, J. P. & Rangel, A. The decision value computations in the vmPFC and striatum use a relative value code that is guided by visual attention. J. Neurosci. 31, 13214–13223 (2011).

    CAS  Article  Google Scholar 

  33. Koba, C., Notaro, G., Tamm, S., Nilsonne, G. & Hasson, U. Spontaneous eye movements during eyes-open rest reduce resting-state-network modularity by increasing visual-sensorimotor connectivity. Netw. Neurosci. 5, 451–476 (2021).

  34. Murphy, K., Birn, R. M. & Bandettini, P. A. Resting-state fMRI confounds and cleanup. Neuroimage 80, 349–359 (2013).

    Article  Google Scholar 

  35. Frey, M. et al. Interpreting wide-band neural activity using convolutional neural networks. eLife 10, e66551 (2021).

  36. Shen, D., Wu, G. & Suk, H. I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017).

    CAS  Article  Google Scholar 

  37. Misra, D. Mish: a self regularized non-monotonic neural activation function. Preprint available at https://arxiv.org/abs/1908.08681 (2019).

  38. Biewald, L. Experiment tracking with weights & biases. http://wandb.com/ (2020).

  39. Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  40. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

Download references

Acknowledgements

We thank I. Polti, J.B. Julian, R. Epstein and A. Bartels for providing imaging and eye-tracking data that were used in the present work. We further thank C.I. Baker and C. Barry for helpful discussions and J.B. Julian and C.I. Baker for comments on an earlier version of this manuscript. This work is supported by the European Research Council (ERC-CoG GEOCOG 724836). C.F.D.’s research is further supported by the Max Planck Society, the Kavli Foundation, the Centre of Excellence scheme of the Research Council of Norway, Centre for Neural Computation (223262/F50), The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits and the National Infrastructure scheme of the Research Council of Norway, NORBRAIN (197467/F50).

Author information

Authors and Affiliations

Authors

Contributions

M.F. and M.N. conceptualized the present work, developed the decoding pipeline and analyzed the data with input from C.F.D. M.F. wrote the key model implementation code with help from M.N. M.N. acquired most and analyzed all datasets, visualized the results and wrote the manuscript with help from M.F. M.F., M.N. and C.F.D. discussed the results and contributed to the manuscript. M.F. and M.N. share first authorship. M.N. and C.F.D. share senior authorship.

Corresponding authors

Correspondence to Markus Frey or Matthias Nau.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Predicted error (PE) correlates with the Euclidean error between real and predicted gaze positions.

This allows to filter the test set post-decoding based on estimated reliability. A) Results plotted for models trained and tested using the fixation target coordinates. B) Results plotted for models trained and tested using labels acquired using camera-based eye tracking. A, B) We plot single-participant data (dots) with regression line as well group-level Whisker-box-plots (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range). Participants were split into 80% most reliable (Low PE, blue) and 20% least reliable participants (high PE, orange). All scores expressed in visual degrees.

Source data

Extended Data Fig. 2 Quantifying gaze decoding in high predicted error and out-of-sample participants.

A) Gaze decoding group results expressed as the coefficient-of-determination (R2). Top panel shows gaze decoding expressed as the R2-score implemented in scikit-learn40 between the true and decoded gaze trajectory for the five key datasets featuring fixations, 3x smooth pursuit and visual search. Note that R2 can range from negative infinity to one. Participants are color coded according to predicted error (PE). We plot Whisker-box-plots for Low-PE participants (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data for all (dots). (B) Group-average spread of decoded positions around true positions collapsed over time in visual degrees for participants with high predicted error (orange dots in A).

Source data

Extended Data Fig. 3 Model evaluation across different decoding schemes.

A) Within-participant gaze decoding obtained by training and testing the model on different data partitions of all participants within a dataset. B) Across-dataset gaze decoding obtained using leave-one-data-set-out cross-validation. We plot the R2-score as implemented in scikit-learn40 between true and decoded gaze trajectory for the five key datasets featuring fixations, 3x smooth pursuit and visual search. Note that R2 can range from negative infinity to one. The results of datasets 1-3 were obtained using the fixation target labels, the ones of datasets 4-5 were obtained using camera-based eye tracking labels. Participants are color coded according to predicted error (PE). A, B) We plot Whisker-box-plots for Low-PE participants (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data for all (dots).

Source data

Extended Data Fig. 4 Model performance evaluated before and after exclusion of volumes with unreliable decoding.

Here, before computing model performance we filtered out either the 0%, 20% or 50% least reliable volumes (that is those with the highest predicted error (PE)). Model performance is expressed as the coefficient-of-determination R2-score implemented in scikit-learn40 between true and decoded gaze trajectory for the five key datasets featuring fixations, 3x smooth pursuit and visual search. Note that R2 can range from negative infinity to one. We plot single-participant data (dots) as well as the mean ± standard error of the mean (line plots). Participant dots were additionally color coded according to the participants’ PE.

Source data

Extended Data Fig. 5 Gaze decoding evaluated using camera-based eye tracking for smooth pursuit datasets 3-4.

Model performance expressed as the Pearson correlation between true and decoded gaze trajectory for the datasets with camera-based eye tracking. Because the visual search dataset 5 used labels obtained using camera-based eye tracking as well, we additionally plot the results obtained for this dataset again for the sake of completeness. Participants are color coded according to predicted error (PE). We plot Whisker-box-plots for Low-PE participants (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data for all (dots).

Source data

Extended Data Fig. 6 Normalized test error as a function of how many participants were used for model training plotted for three different viewing behaviors.

We plot single participant data (dots) as well as the across-participant average model performance (black lines). Error bars depict the standard error of the mean. Right panel shows the average across datasets.

Source data

Extended Data Fig. 7 Across-participant decoding performance as a function of how much single-participant data was considered for model training as well as of the number of participants in the training data (n=8 and n=20).

We plot the group-level mean (line plots) ± standard error of the mean (error bars) of the model performance expressed as the Pearson correlation and the R-squared score between real and predicted gaze path in the test set. For free viewing, model performance saturates at as little as 5-10 Minutes of training data. Note that these results likely depend on the viewing behavior and on how similar the behavior is across data partitions and participants.

Source data

Extended Data Fig. 8 Sub-imaging decoding resolution.

A) Group results when all 10 sub-TR samples are considered for computing the Pearson correlation between true and decoded gaze trajectories. Participants are color coded according to predicted error (PE). We plot Whisker-box-plots for Low-PE participants (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data for all (dots). B) Similar standard deviation of real and decoded gaze labels within each functional volume (TR), that is if the 10 real gaze labels of a TR had a high standard deviation (indicating larger eye movements within this TR) then the 10 decoded gaze labels showed a high standard deviation as well. We plot the Pearson correlation between the within-TR standard deviation computed using the full time course of each participant as Whisker-box-plots (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data as dots. C, D) Single-participant examples of gaze decoding at a virtual sub-imaging resolution of 10 samples per volume. We plot three example participants with low predicted error (C) and three example participants with high predicted error (D) for the fixation, smooth pursuit and free-viewing datasets11,14,15. Functional-volume onsets plotted as grey vertical lines.

Source data

Extended Data Fig. 9 Eyes-open vs. eyes-closed across-participant decoding in smooth pursuit dataset 3.

A) Single-participant example of decoding the proportion of time spent eyes-closed. Note that model accuracy and hit rates were computed on binarized decoding labels, but that the model output is the actual proportion of time spent eyes-closed as shown on in this panel. B) Group-level accuracy for decoding whether the eyes were open or closed for more than 10% of the time it took to acquire the respective functional volume (left panel). We plot whisker-box-plots (central line: median, box: 25th and 75th percentile, whisker: all data points not considered outliers, outliers: data points outside 1.5x interquartile range) and single-participant data (dots). We calculated balanced accuracy to rule out that the results reflect the model always classifying the most common label. In addition, we plot a Receiver operating characteristic (ROC) curve of the group-level data (right panel).

Source data

Extended Data Fig. 10 General-linear-model (GLM) group results for the contrast ‘Far vs. short eye movements’ during visual search without accounting for the hemodynamic response function.

We plot the F-statistic of this contrast superimposed on a template surface (fsaverage) for gaze-labels obtained with camera-based eye tracking (first panel) as well as for three DeepMReye cross-validation schemes. Within-participants: All participants of a dataset were included with different partitions in model training and test. Across-participants: Different participants were included during model training and test. Across-datasets: Different datasets (and hence also different participants) were included during model training and test.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–5 and Tables 1 and 2.

Reporting Summary

Source data

Source Data Fig. 1

Multivoxel pattern of the eyeballs.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Unthresholded statistical maps.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Source Data Extended Data Fig. 10

Unthresholded statistical maps.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Frey, M., Nau, M. & Doeller, C.F. Magnetic resonance-based eye tracking using deep neural networks. Nat Neurosci 24, 1772–1779 (2021). https://doi.org/10.1038/s41593-021-00947-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-021-00947-w

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing