Extensive sampling of neural activity during rich cognitive phenomena is critical for robust understanding of brain function. Here we present the Natural Scenes Dataset (NSD), in which high-resolution functional magnetic resonance imaging responses to tens of thousands of richly annotated natural scenes were measured while participants performed a continuous recognition task. To optimize data quality, we developed and applied novel estimation and denoising techniques. Simple visual inspections of the NSD data reveal clear representational transformations along the ventral visual pathway. Further exemplifying the inferential power of the dataset, we used NSD to build and train deep neural network models that predict brain activity more accurately than state-of-the-art models from computer vision. NSD also includes substantial resting-state and diffusion data, enabling network neuroscience perspectives to constrain and enhance models of perception and memory. Given its unprecedented scale, quality and breadth, NSD opens new avenues of inquiry in cognitive neuroscience and artificial intelligence.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Scientific Data Open Access 24 November 2022
Nature Communications Open Access 29 October 2022
Scientific Data Open Access 30 September 2022
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $6.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
The NSD dataset is freely available at http://naturalscenesdataset.org. The data are hosted in the cloud, allowing researchers to exploit high-performance cloud computing to efficiently analyze the dataset. We provide both raw data in BIDS format86 and prepared data files, along with extensive technical documentation in the NSD Data Manual. To ensure strict validation for an upcoming Algonauts prediction challenge87, the initial public release will withhold the last three NSD scan sessions from each participant (approximately 8.4% of the NSD data). Images used for the NSD were taken from the Common Objects in Context database14 (https://cocodataset.org).
We provide an archive of code used in this study (https://github.com/cvnlab/nsddatapaper/) as well as utility functions for working with the prepared NSD data (https://github.com/cvnlab/nsdcode/). Custom algorithms developed for this study include GLMsingle (https://github.com/cvnlab/GLMsingle/) and fracridge (https://github.com/nrdg/fracridge/). Example scripts demonstrating scientific analyses of the NSD data are available (https://github.com/cvnlab/nsdexamples/); these scripts might be useful for teaching purposes.
de Vries, S. E. J. et al. A large-scale standardized physiological survey reveals functional organization of the mouse visual cortex. Nat. Neurosci. 23, 138–151 (2020).
Siegle, J. H. et al. Survey of spiking in the mouse visual system reveals functional hierarchy. Nature 592, 86–92 (2021).
Stringer, C., Pachitariu, M., Steinmetz, N., Carandini, M. & Harris, K. D. High-dimensional geometry of population responses in visual cortex. Nature 571, 361–365 (2019).
Markram, H. et al. Reconstruction and simulation of neocortical microcircuitry. Cell 163, 456–492 (2015).
Van Essen, D. C. et al. The WU-Minn human connectome project: an overview. Neuroimage 80, 62–79 (2013).
Zheng, Z. et al. A complete electron microscopy volume of the brain of adult Drosophila melanogaster. Cell 174, 730–743 (2018).
Van Essen, D. C. et al. Mapping visual cortex in monkeys and humans using surface-based atlases. Vis. Res. 41, 1359–1378 (2001).
Grill-Spector, K. & Malach, R. The human visual cortex. Annu. Rev. Neurosci. 27, 649–677 (2004).
Wheeler, M. E., Petersen, S. E. & Buckner, R. L. Memory’s echo: vivid remembering reactivates sensory-specific cortex. Proc. Natl Acad. Sci. USA 97, 11125–11129 (2000).
Breedlove, J. L., St-Yves, G., Olman, C. A. & Naselaris, T. Generative feedback explains distinct brain activity codes for seen and mental images. Curr. Biol. 30, 2211–2224 (2020).
Kay, K. N., Weiner, K. S. & Grill-Spector, K. Attention reduces spatial uncertainty in human ventral temporal cortex. Curr. Biol. 25, 595–600 (2015).
Huth, A. G., Nishimoto, S., Vu, A. T. & Gallant, J. L. A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron 76, 1210–1224 (2012).
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (University of Toronto, 2009).
Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. European Conference on Computer Vision. https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48, 740–755 (Springer, 2014).
Güçlü, U. & van Gerven, M. A. J. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35, 10005–10014 (2015).
Khaligh-Razavi, S.-M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 10, e1003915 (2014).
Seeliger, K. et al. End-to-end neural system identification with neural information flow. PLoS Comput. Biol. 17, e1008558 (2021).
Stansbury, D. E., Naselaris, T. & Gallant, J. L. Natural scene statistics account for the representation of scene categories in human visual cortex. Neuron 79, 1025–1034 (2013).
St-Yves, G. & Naselaris, T. The feature-weighted receptive field: an interpretable encoding model for complex feature spaces. Neuroimage 180, 188–202 (2018).
Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).
Naselaris, T. et al. Cognitive computational neuroscience: a new conference for an emerging discipline. Trends Cogn. Sci. 22, 365–367 (2018).
Chang, N. et al. BOLD5000, a public fMRI dataset while viewing 5000 visual images. Sci. Data 6, 49 (2019).
Horikawa, T. & Kamitani, Y. Generic decoding of seen and imagined objects using hierarchical visual features. Nat. Commun. 8, 15037 (2017).
Kay, K. N., Naselaris, T., Prenger, R. J. & Gallant, J. L. Identifying natural images from human brain activity. Nature 452, 352–355 (2008).
Triantafyllou, C. et al. Comparison of physiological noise at 1.5 T, 3 T and 7 T and optimization of fMRI acquisition parameters. Neuroimage 26, 243–250 (2005).
Brady, T. F., Konkle, T., Alvarez, G. A. & Oliva, A. Visual long-term memory has a massive storage capacity for object details. Proc. Natl Acad. Sci. USA 105, 14325–14329 (2008).
Haxby, J. V., Guntupalli, J. S., Nastase, S. A. & Feilong, M. Hyperalignment: modeling shared information encoded in idiosyncratic cortical topographies. eLife 9, e56601 (2020).
Power, J. D., Lynch, C. J., Adeyemo, B. & Petersen, S. E. A critical, event-related appraisal of denoising in resting-state fMRI studies. Cereb. Cortex 30, 5544–5559 (2020).
Roth, Z. N., Ryoo, M. & Merriam, E. P. Task-related activity in human visual cortex. PLoS Biol. 18, e3000921 (2020).
Benson, N. C. et al. The human connectome project 7 Tesla retinotopy dataset: description and population receptive field analysis. J. Vis. 18, 23 (2018).
Stigliani, A., Weiner, K. S. & Grill-Spector, K. Temporal processing capacity in high-level visual cortex is domain specific. J. Neurosci. 35, 12412–12424 (2015).
Kay, K. et al. A critical assessment of data quality and venous effects in sub-millimeter fMRI. Neuroimage 189, 847–869 (2019).
Gordon, E. M. et al. Precision functional mapping of individual human brains. Neuron 95, 791–807 (2017).
Kang, X., Yund, E. W., Herron, T. J. & Woods, D. L. Improving the resolution of functional brain imaging: analyzing functional data in anatomical space. Magn. Reson. Imaging 25, 1070–1078 (2007).
Kay, K. N., Rokem, A., Winawer, J., Dougherty, R. F. & Wandell, B. GLMdenoise: a fast, automated technique for denoising task-based fMRI data. Front. Neurosci. 7, 247 (2013).
Rokem, A. & Kay, K. Fractional ridge regression: a fast, interpretable reparameterization of ridge regression. Gigascience 9, giaa133 (2020).
Albrecht, D. G. & Hamilton, D. B. Striate cortex of monkey and cat: contrast response function. J. Neurophysiol. 48, 217–237 (1982).
Wagner, A. D., Shannon, B. J., Kahn, I. & Buckner, R. L. Parietal lobe contributions to episodic memory retrieval. Trends Cogn. Sci. 9, 445–453 (2005).
Spaniol, J. et al. Event-related fMRI studies of episodic encoding and retrieval: meta-analyses using activation likelihood estimation. Neuropsychologia 47, 1765–1779 (2009).
Gonzalez-Castillo, J. et al. Whole-brain, time-locked activation with simple tasks revealed using massive averaging and model-free analysis. Proc. Natl Acad. Sci. USA 109, 5487–5492 (2012).
Maaten, Lvander & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Connolly, A. C. et al. The representation of biological classes in the human brain. J. Neurosci. 32, 2608–2618 (2012).
Naselaris, T., Stansbury, D. E. & Gallant, J. L. Cortical representation of animate and inanimate objects in complex natural scenes. J. Physiol. Paris 106, 239–249 (2012).
Long, B., Yu, C.-P. & Konkle, T. Mid-level visual features underlie the high-level categorical organization of the ventral stream. Proc. Natl Acad. Sci. USA 115, E9015–E9024 (2018).
Henriksson, L., Khaligh-Razavi, S.-M., Kay, K. & Kriegeskorte, N. Visual representations are dominated by intrinsic fluctuations correlated between areas. Neuroimage 114, 275–286 (2015).
Naselaris, T., Kay, K. N., Nishimoto, S. & Gallant, J. L. Encoding and decoding in fMRI. Neuroimage 56, 400–410 (2011).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html, 1097–1105 (2012).
Cadena, S. A. et al. Deep convolutional models improve predictions of macaque V1 responses to natural images. PLoS Comput. Biol. 15, e1006897 (2019).
Wang, A., Tarr, M. & Wehbe, L. Neural Taskonomy: Inferring the Similarity of Task-Derived Representations from Brain Activity. In Advances in Neural Information Processing Systems 32 https://papers.nips.cc/paper/2019/hash/f490c742cd8318b8ee6dca10af2a163f-Abstract.html, 15475–15485 (2019).
Sinz, F. H., Pitkow, X., Reimer, J., Bethge, M. & Tolias, A. S. Engineering a less artificial intelligence. Neuron 103, 967–979 (2019).
Aliko, S., Huang, J., Gheorghiu, F., Meliss, S. & Skipper, J. I. A naturalistic neuroimaging database for understanding the brain using ecological stimuli. Sci. Data 7, 347 (2020).
Nastase, S. A., Liu, Y.-F., Hillman, H., Norman, K. A. & Hasson, U. Leveraging shared connectivity to aggregate heterogeneous datasets into a common response space. Neuroimage 217, 116865 (2020).
Taylor, J. R. et al. The cambridge centre for ageing and neuroscience (Cam-CAN) data repository: structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample. Neuroimage 144, 262–269 (2017).
Bellec, P. & Boyle, J. A. Bridging the gap between perception and action: the case for neuroimaging, AI and video games. Preprint at https://psyarxiv.com/3epws (2019).
Pinho, A. L. et al. Individual Brain Charting, a high-resolution fMRI dataset for cognitive mapping. Sci. Data 5, 180105 (2018).
Poldrack, R. A. et al. Long-term neural and physiological phenotyping of a single human. Nat. Commun. 6, 8885 (2015).
Seeliger, K., Sommers, R. P., Güçlü, U., Bosch, S. E. & van Gerven, M. A. J. A large single-participant fMRI dataset for probing brain responses to naturalistic stimuli in space and time. Preprint at https://www.biorxiv.org/content/10.1101/687681v1 (2019).
Naselaris, T., Allen, E. & Kay, K. Extensive sampling for complete models of individual brains. Curr. Opin. Behav. Sci. 40, 45–51 (2021).
Polimeni, J. R., Renvall, V., Zaretskaya, N. & Fischl, B. Analysis strategies for high-resolution UHF-fMRI data. Neuroimage 168, 296–320 (2018).
Harms, M. P. et al. Extending the Human Connectome Project across ages: imaging protocols for the Lifespan Development and Aging projects. Neuroimage 183, 972–984 (2018).
Power, J. D. et al. Customized head molds reduce motion during resting state fMRI scans. Neuroimage 189, 141–149 (2019).
Brainard, D. H. The Psychophysics Toolbox. Spat. Vis. 10, 433–436 (1997).
Pelli, D. G. The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat. Vis. 10, 437–442 (1997).
Caesar, H., Uijlings, J. & Ferrari, V. COCO-Stuff: Thing and Stuff classes in context. In IEEE/CVF Conf. Computer Vision and Pattern Recognition https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00132 1209–1218 (2018).
Schira, M. M., Tyler, C. W., Breakspear, M. & Spehar, B. The foveal confluence in human visual cortex. J. Neurosci. 29, 9050–9058 (2009).
Shahid, A., Wilkinson, K., Marcu, S. & Shapiro, C. M. Stanford Sleepiness Scale (SSS). In: STOP, THAT and One Hundred Other Sleep Scales (eds. Shahid, A., Wilkinson, K., Marcu, S. & Shapiro, C. M.) 369–370 (Springer, 2012).
Marks, D. F. Visual imagery differences in the recall of pictures. Br. J. Psychol. 64, 17–24 (1973).
Torgesen, J. K., Wagner, R. & Rashotte, C. TOWRE-2: Test of Word Reading Efficiency (Pearson, 2012).
Duchaine, B. & Nakayama, K. The Cambridge Face Memory Test: results for neurologically intact individuals and an investigation of its validity using inverted face stimuli and prosopagnosic participants. Neuropsychologia 44, 576–585 (2006).
Tardif, J., Watson, M., Giaschi, D. & Gosselin, F. Measuring the contrast sensitivity function in just three clicks. J. Vis. 16, 966–966 (2016).
Arora, S., Liang, Y. & Ma, T. A simple but tough-to-beat baseline for sentence embeddings. https://openreview.net/pdf?id=SyK00v5xx (2017).
Kriegeskorte, N. & Mur, M. Inverse MDS: inferring dissimilarity structure from multiple item arrangements. Front. Psychol. 3, 245 (2012).
Kay, K., Jamison, K. W., Zhang, R.-Y. & Uğurbil, K. A temporal decomposition method for identifying venous effects in task-based fMRI. Nat. Methods 17, 1033–1039 (2020).
Avants, B. B. et al. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage 54, 2033–2044 (2011).
Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31, 1116–1128 (2006).
Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2019).
Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L. & Petersen, S. E. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. Neuroimage 59, 2142–2154 (2012).
Handwerker, D. A., Gonzalez-Castillo, J., D’Esposito, M. & Bandettini, P. A. The continuing challenge of understanding and modeling hemodynamic variation in fMRI. Neuroimage 62, 1017–1023 (2012).
Hoerl, A. E. & Kennard, R. W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970).
Kay, K. N., Winawer, J., Mezer, A. & Wandell, B. Compressive spatial summation in human visual cortex. J. Neurophysiol. 110, 481–494 (2013).
Lage-Castellanos, A., Valente, G., Formisano, E. & De Martino, F. Methods for computing the maximum performance of computational models of fMRI responses. PLoS Comput. Biol. 15, e1006397 (2019).
Biswal, B., Yetkin, F. Z., Haughton, V. M. & Hyde, J. S. Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magn. Reson. Med. 34, 537–541 (1995).
Nili, H. et al. A toolbox for representational similarity analysis. PLoS Comput. Biol. 10, e1003553 (2014).
Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, 4 (2008).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Gorgolewski, K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data 3, 1–9 (2016).
Cichy, R. M., Roig, G. & Oliva, A. The Algonauts Project. Nat. Mach. Intell. 1, 613 (2019).
We thank the NSD participants for their time and endurance; E. Aminoff, J. Pyles, M. Tarr, M. Hebart and C. Baker for advice on experimental design and data collection; J. Power and A. Schapiro for consultation on resting-state and physiological data; V. Carr and R. Olsen for consultation on hippocampal subfield scanning protocols; A. Grant for assistance with scanner peripherals; F. Gosselin and J. Tardif for contrast sensitivity analysis; B. Klimes-Dougan and K. Cullen for designing the valence/arousal assessment; W. Guo for segmentations of the medial temporal lobe; M. Arcaro, A. Bratch, D. Finzi, A. White and J. Winawer for assistance with ROI definition; C. Gorgolewski and R. Poldrack for discussion of BIDS and data sharing; R. Cichy, E. Yacoub, K. Grill-Spector, K. Jamison, A. Rokem, A. Huth, S. Anzellotti, N. Kriegeskorte and J. Winawer for general discussions; and K. Ugurbil for overall project advice. We also thank our NSD collaborators for shaping the trajectory of the project. This work was supported by NSF CRCNS grants IIS-1822683 (K.K.) and IIS-1822929 (T.N.); NIH grants P41 EB015894, P30 NS076408, S10 RR026783 and S10 OD017974-01, the W. M. Keck Foundation and the NIMH Intramural Research Program ZIAMH002909 (M.N.); and NSF BCS-1734853, NIH NIBIB R01EB030896, NIH NIBIB R01EB029272 and NIH IIS-1912270 (F.P.).
The authors declare no competing financial interests.
Peer review information Nature Neuroscience thanks Evan Gordon, Andrew Zalesky, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Image presentations. Each of 10,000 distinct images was placed 3 times on a circle according to a probability distribution created by mixing a relatively narrow von Mises distribution and a uniform distribution. The resulting image sequence was divided into 40 equally-sized segments for the 40 NSD scan sessions. b, Basic statistics of image repetitions. We define novel trial as a trial involving an image never shown before, old trial as a trial that is not a novel trial, and easy trial as an old trial for which the presented image had been shown previously in the same scan session.
This table summarizes the overall NSD data collection effort. Structural and diffusion MRI data were collected at 3T. Functional MRI data were collected at 7T. The breakdown of the 7T fMRI scan sessions is indicated: for example, subject 2 participated in 1 (prffloc) + 40 (nsd01–nsd40) + 1 (nsdsynthetic) + 1 (nsdimagery) = 43 7T fMRI scan sessions. Additional behavioral data were acquired outside of the scanner (nsdpostbehavior, nsdmemory, nsdmeadows). Note that scan sessions were occasionally split across multiple magnet entries (see aquamarine and yellow cells). For simplicity, we treat these cases as if they represent single scan sessions.
Analyses conducted in this paper can be divided into three parts. Part 1 consists of pre-processing, in which raw functional, anatomical, diffusion, and eyetracking data are transformed into various useful intermediate outcomes. In addition, coordinate transformations between various spaces are estimated and incorporated into the nsd_mapdata utility. Part 2 consists of analyses of the pre-processed fMRI data. The GLMsingle algorithm introduced in this paper is used to analyze the fMRI data from the NSD experiment (Part 2a), and standard methods are used to analyze the fMRI data from the pRF and fLoc experiments (Part 2b). Part 3 consists of specific scientific analyses demonstrated in this paper that make use of the data prepared in Parts 1 and 2. Given the extensive data preparation procedures (Parts 1–2), it is useful to comment on which aspects are fairly typical in MRI processing and which are more customized or unique to the present work. With respect to the pre-processing steps in Part 1, the general outcomes that these steps achieve are typical in MRI and are necessary for basic interpretation of the data. For example, small shifts in head position over the course of a scan session necessitate some motion compensation in order to interpret the signal from a given voxel in terms of a single brain location. The specific methods by which we execute these pre-processing steps may differ from what is performed in commonly used software packages (for example, SPM, FSL, AFNI). However, the outcomes are similar at a conceptual level: for example, the fMRI data are pre-processed using temporal interpolation of voxel-wise time-series data and spatial interpolation of brain volumes. With respect to the additional preparation procedures in Part 2, the procedures in Part 2b are fairly typical analyses used to functionally localize brain regions. More customized and unique to the present work are the procedures in Part 2a, which are designed to improve the accuracy of single-trial fMRI amplitude estimates. We provide evidence that these procedures do in fact perform as intended (see Fig. 3 and Extended Data Fig. 8).
a, Pre-processing of eyetracking data. Blinks and tracking noise were removed, followed by linear detrending, median-centering, downsampling, and smoothing. Runs with less than 1/3 valid samples after these cleaning procedures were excluded from further analysis (see Supplementary Note 5). Shown are results for an example run (subject 1, nsd31 scan session, run 6). Pre-processing reduced noise without obscuring potential eye movements. b, Fraction of time during which deviation from central fixation was less than a specific threshold. Results are shown for a range of thresholds (left) and for a threshold of 1° (right). c, 2D histograms of gaze positions. The main images show histogram results on a linear scale; the inset images show results on a log scale. To summarize the results, we overlay a gray ellipse marking the central 90% of a multivariate 2D Gaussian distribution that has been fit to the gaze positions, as well as a blue circle containing 90% of the gaze positions. Both the parametric and non-parametric approaches yield similar results and indicate that gaze positions of all subjects clustered around central fixation. The level of precision varied across subjects. The number of usable eyetracking runs for each subject is indicated by the white text. d, Example of accurate fixation behavior (subject 1, nsd31 scan session, run 8). Shown are pre-processed vertical gaze coordinates (top left), normalized pupil area (bottom left), and a 2D scatter plot of gaze positions (right). e, Example of eye movements (subject 5, nsd29 scan session, run 11). Same format as d. Notice that eye movements manifest as staircase structure in the vertical gaze coordinates and as dispersed gaze positions in the scatter plot. f, Trial-wise time-resolved analysis. Relative to stimulus trial onsets, we plot the across-trial median deviation from central fixation (top), as well as the across-trial median pupil size after mean-centering the pupil size within each trial (bottom). Results for subjects 3 and 8 are not available for this analysis. Overall, the results show that subjects were able to maintain fixation most of the time: gaze positions were within 1° of central fixation 68–97% of the time (see b). Three subjects are worth further discussion. Subject 4 exhibited eye movements after stimulus onset (see f, top); however, this is of minor concern given that these movements were small. Subject 5 exhibited more substantial eye movements (see c, e, and f); we suggest exclusion of this subject from analyses of the NSD fMRI data that are contingent on strict central fixation. Finally, while our results indicate fixation instability for subject 8 (see b and c), careful inspection of the eyetracking video recordings (available online) suggests this reflects pupil tracking noise rather than actual eye movements made by the subject.
a, Comparison of approaches. For an example coronal slice in Subject 1, we compare the non-upsampled 1.8-mm preparation of the data (left), the upsampled 1-mm preparation of the data (right), and a version of the 1.8-mm results that has been post-hoc upsampled to 1-mm resolution to enable direct comparison (middle). Two quantities are shown: mean signal intensity and variance explained by an ON-OFF GLM model. b, Zoomed view of white rectangle marked in a. c, Profile view of blue dotted horizontal line marked in b. Error bars in the bottom plot indicate ± 1 SEM across 40 scan sessions (error bars are small and nearly invisible). d, Timecourse estimates for voxels marked by orange arrowheads at the bottom of c. Each colored trace corresponds to an estimate of the hemodynamic timecourse for a single voxel in one NSD scan session from the upsampled 1-mm data preparation. The beginning of the timecourses (first vertical line) corresponds to the onset of the 3-s image presentation. The results shown in this figure support the idea that the upsampled data preparation preserves fine-scale spatial detail that is lost (blurred away) under a non-upsampled data preparation. While the effects are small, preserving as much detail as possible may be critical for certain neuroscientific questions.
Extended Data Fig. 6 Reliable diffusion derivatives facilitate investigation of white-matter connectivity.
a, Fractional anisotropy (FA). The left shows tractography and FA results for the optic radiation identified in subject 7. The right shows reliability of FA results for 61 white-matter tracts identified using the atlas from Bullock et al.114 For other measures, see Supplementary Fig. 5c–e. b, Structural connectivity. Using 43 visual areas × 2 hemispheres = 86 regions from the HCP-MMP1 atlas109 (left), we construct group-average connectivity matrices indicating the density of fibers connecting pairs of regions (right). c, Quantitative summary. Each dot represents fiber density between a pair of regions (as in b). Dot colors reflect different region pairs but are otherwise arbitrary. Group-average results (main figure) and results for an individual subject (inset) are shown.
A variety of ROIs were defined based on auxiliary fMRI experiments (pRF, fLoc). In a–c, we show example results for subject 3, right hemisphere. a, Early visual areas. Results are shown on FreeSurfer’s sphere surface as well as in the 0.8-mm anatomical volume space. b, Eccentricity-based regions. Similar format to a. Note that the total stimulus extent is 8.4° × 8.4° in the pRF, fLoc, and NSD experiments. c, Face-selective regions. Regions were defined based on t-values computed for the contrast of faces against all other categories. Results are shown on FreeSurfer’s inflated surface as well as in the 0.8-mm anatomical space. d, Probabilistic maps of ROI locations. For each of three example ROIs, we map the location of the ROI in each subject to fsaverage and then compute, for each fsaverage vertex, the fraction of subjects labeled at that vertex. Notice there is reasonable consistency across subjects in fsaverage space.
We prepared three beta versions (b1, b2, b3) reflecting GLM analyses of increasing sophistication. a, Inspection of NSD betas. The full set of estimated single-trial responses (1.8-mm preparation, beta version b1) is shown for voxels in subject 1 right hemisphere region of interest (ROI) FFA-1 (fusiform face area subdivision 1). We observe horizontal stripes, indicative of gross variation in percent BOLD signal change across voxels. b, Zoomed view of one scan session. Shown are all three beta versions, as well as the result of z-scoring betas within each scan session (in general, we suggest that users may wish to z-score each voxel’s responses within each scan session in order to eliminate potential non-stationarities and to equalize units across voxels). The different beta versions generally resemble one another (left column), implying that the variations in GLM methods do not drastically change the data. Vertical stripes visible in the visualizations tend to decrease from b1 to b2, suggesting that fitting voxel-wise HRFs reduces artifacts. Vertical stripes also tend to decrease from b2 to b3, which might reflect the reduction of correlated noise achieved by GLMdenoise. c, Detailed inspection of one voxel. To assess the reliability of evoked responses, we group trials according to the image presented. The estimated signal standard deviation (σsignal) and noise standard deviation (σnoise) are illustrated at the right of each subplot. Notice that b2 and b3 reduce variability of betas across the 3 trials associated with each image. d, Response reliability. Here we plot single-trial responses observed in two example ROIs (1.8-mm preparation, beta version b2, right hemisphere FFA-1 and PPA (parahippocampal place area), response averaged across voxels in each ROI), showing the first 50 of the shared515 images. The left column shows responses for different trials in subject 1; the right column shows trial-averaged responses in different subjects. Lines connecting consecutive images are used to aid visualization but do not indicate specific temporal relationships between images. Thick black lines indicate the mean across trials (left) or subjects (right). Notice that reliability is reasonably high both within and across subjects. e, Quantitative summary. To summarize results shown in d, we plot the correlation between responses to the shared515 images across all trials and all subjects. Thin white horizontal and vertical lines separate different subjects (each having 3 trials). Notice there is high reliability within each ROI, and responses are highly dissimilar across ROIs. The strong off-diagonal elements (white arrows) indicate the presence of spatial noise correlations that occur on individual trials, which is typical in fMRI45. Noise correlations likely reflect a combination of measurement noise (for example, head motion) and real neural activity variability (for example, arousal effects). In some cases, correlations are larger across subjects than within subjects; one explanation is that there is, to some degree, a common ROI representation and a noisy measurement of this representation obtained in one subject might actually be better correlated with a less noisy measurement of this representation obtained in a different subject. Also, the results indicate the existence of temporal ordering effects (for example, trial 1 in a given subject tends to be more correlated with trial 1 in other subjects as opposed to trials 2 or 3). This likely indicates the presence of adaptation- and/or memory-related effects in the NSD data, given that the temporal ordering of trials was fixed across subjects.
Here we show results from the analysis of the pRF experiment and results from an analogous analysis performed on trial-averaged NSD betas (see Supplementary Modeling Note 1 for details). Each panel shows an occipital view of FreeSurfer’s sphere surface, and white lines indicate borders of visual areas V1–hV4 (defined based on results of the pRF experiment). Angle and eccentricity estimates are plotted using the same colormaps as in Benson et al.30 We also plot the amount of time-series variance explained in the pRF data (variance relative to the mean signal level) and the amount of variance explained in the NSD betas (variance relative to 0% BOLD signal change). Clear retinotopic maps in early visual cortex are visible in the NSD results, including robust angle estimates even in foveal regions. In addition, there is high consistency of retinotopic estimates across the pRF and NSD datasets. There is some discrepancy in absolute eccentricity estimates at peripheral locations; this is likely due to technical differences in how modeling procedures behave for voxels near the stimulus edge.
a, Illustration of an encoding model that predicts brain activity in a given voxel (rtv) in response to images (xt). Images are passed to nonlinear feature extractors, ηl (trapezoids), that output feature maps (grey cuboids). Feature maps are grouped, passed through an element-wise nonlinearity, f(·), and then multiplied pixel-wise by a spatial pooling field (g1,…,gN where superscripts index distinct groups of feature maps) that determines the region of visual space that drives voxel activity. The weighted pixel values in each feature map are then summed, reducing each feature map to a scalar value. These scalar values are concatenated across all feature maps, forming a single feature vector that is passed through another element-wise nonlinearity (left black rectangle) and then weighted by a set of feature weights, w (right black rectangle), to yield predicted voxel activity. Note that for each type of encoding model (for example, AlexNet-based encoding model, GNet-based encoding model), the feature extractors are identical for all voxels, but the spatial pooling fields and feature weights are optimized and may vary across voxels. For the AlexNet-based encoding model, the feature extractors were pre-specified, the spatial pooling fields were optimized via line search, and the feature weights w were optimized via ridge regression. For the GNet-based encoding model, stochastic gradient descent with early stopping was used to optimize the parameters of the feature extractors ηl, the spatial pooling fields g1,…,gN, and the feature weights w. b, Illustration of spatial pooling fields. For the AlexNet model, a single isotropic 2D Gaussian pooling field (middle) selected from a set of candidates (right) was applied to all feature maps. For the GNet model, an independent, flexible pooling field (left) was applied to each group of feature maps. Applying flexible pooling fields to AlexNet leads to lower prediction accuracy overall, so we present the version that uses isotropic 2D Gaussian fields. c, Comparative architecture of AlexNet and GNet. AlexNet and GNet are both deep convolutional neural networks, but differ in the types and sequencing of layers (rows of the table). The first three layers are the same for both networks and correspond to the first three layers of an AlexNet trained to classify objects in the ImageNet dataset. For both networks, these shared ‘pre-filtering’ layers are followed by sequences of convolutional layers (rows labeled ‘conv’; values indicate feature depth and convolutional filter resolution; ‘str’ = filter stride, ‘pad’ = convolutional padding), max-pooling layers (‘maxpool’), batch-normalization and weight-dropout layers (‘batchnorm + dropout’), adaptive averaging layers (‘adaptive avg’), and fully-connected layers (‘fully con.’; value indicates number of units). Feature maps in the convolutional or fully connected layers (indicated by red arrows; resolution of the feature maps in parentheses) are used as predictors of brain activity in the context of an encoding model (see a).
About this article
Cite this article
Allen, E.J., St-Yves, G., Wu, Y. et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat Neurosci 25, 116–126 (2022). https://doi.org/10.1038/s41593-021-00962-x
This article is cited by
Scientific Reports (2022)
Scientific Data (2022)
Scientific Data (2022)
Nature Communications (2022)
Nature Neuroscience (2022)