Background & Summary

fMRI is an invaluable non-invasive tool for illuminating the brain’s architecture, especially for human-unique abilities like language. A common analytic approach in fMRI language studies is to average activation maps voxel-wise in a common brain space and perform statistical inference across individuals in each voxel. However, because of the well-established inter-individual variability in the locations of functional areas in the association cortex1,2, activations do not align well across individuals, leading to low sensitivity and functional resolution3. Further, the results of group-averaging analyses are generally interpreted through reverse inference from anatomical locations to function4,5, but because of the variability mentioned above, combined with the functional heterogeneity of the association cortex, locations in a common brain space cannot be meaningfully linked to function (see6 for a discussion of this issue for ‘Broca’s area’).

An alternative analytic approach, which circumvents voxel-wise brain averaging, is known as ‘functional localization’3,7. In this approach, a brain region or network that supports a mental process of interest is identified with a functional contrast in each individual and then its responses to some new critical condition(s) are examined. This approach yields greater sensitivity, functional resolution, and interpretability, and has been successful across many domains of perception and cognition, including language. As a result, many research groups are now moving away from group-averaging analyses toward individual-subject analyses4,8.

However, functional localization is not always feasible. Further, although studies that rely on functional localization can be straightforwardly compared to each other, it is at present unclear how to relate the results from such studies to group-averaging fMRI studies, or other studies that rely on brain averaging (e.g., studies that use voxel-based morphometry (VBM) or voxel-based lesion-symptom mapping (VLSM) in patient work9). To help bridge the gap between these two analytic traditions in language research, we created a probabilistic functional atlas of the language network (‘Language Atlas’ or LanA) by overlaying 806 individual activation maps for a robust at the individual-subject level and extensively validated language ‘localizer’10,11.

The language localizer relies on a contrast between the processing of sentences and a linguistically/acoustically degraded control condition and is robust to changes in materials, modality of presentation, and task (see Methods). This localizer identifies the left-lateralized fronto-temporal language network (e.g.12,13,14) that selectively6,15 supports high-level language comprehension and production16,17,18, including the processing of word meanings and combinatorial syntactic/semantic processing19,20,21. By design, this contrast excludes lower-level perceptual22,23,24,25 and speech-articulatory26,27,28 processes, as well as discourse-level comprehension29,30,31,32. Further, a network that closely corresponds to this functional contrast emerges from task-free resting state data33. (Many researchers have postulated functional dissociations among the different brain regions that comprise the language network (e.g.12,13,14,34,35,36). However, the empirical landscape remains complex and ridden with controversy, and the evidence is now overwhelming that—even if dissociations exist within the network— all the language regions are strongly synchronized in their activity33,37,38, suggesting that they form a functionally integrated system).

LanA allows one to estimate for any location in a common brain space the probability that it falls within the language network. In this way, this atlas can provide a common reference frame and help interpret (a) group-level activation peaks from past and future fMRI studies, or results of meta-analyses of such peaks39, (b) lesion locations in individual brains40 or lesion overlap loci in VBM/VLSM analyses, and (c) electrode locations in ECoG/SEEG investigations or locations of source-localized activity in MEG studies. Furthermore, LanA (d) can help select language-selective units (voxels, electrodes, MEG channels, even single cells) for analysis in existing datasets, including in studies that aim to relate human neural representations to those from artificial neural network language models41,42,43,44,45,46, (e) can be related voxel-by-voxel to any whole-brain data47, including structural data, gene expression data48, or receptor density data49, in order to ask whether/how these features correlate with the language network’s topography, and (f) can help select patches in post-mortem brains for cellular analyses to maximize the chances of examining language cortex. Finally, LanA (g) can help guide/constrain functional mapping during brain surgery when fMRI is not possible, although, of course, no clinical decisions should be made based on LanA alone. We make the atlas available for two most commonly used brain templates (Fig. 1): a volume-based template (MNI IXI549Space; SPM1250) and a surface-based template (fsaverage; FreeSurfer51). The use of these common data formats will allow for easy interfacing with existing open data repositories such as NeuroVault52 and ENIGMA53. We emphasize that LanA is not a replacement for localizers: when possible, a language localizer task should be performed54. As we show in SI-1, the effect sizes obtained from group-level regions of interest (ROIs) based on LanA, or from commonly used Glasser parcels55 are underestimated relative to individually defined language functional ROIs.

Fig. 1
figure 1

Language atlas topography. Probabilistic functional atlas for the language > control contrast based on overlaid individual binarized activation maps (where in each map, the top 10% of voxels are selected, as described in the text). (a) SPM-analyzed volume data in the MNI template space (based on 806 individual maps). (b) FreeSurfer-analyzed surface data in the FSaverage template space (based on 804 individual maps). In both figures, the color scale reflects the proportion of participants for whom that voxel/vertex belongs to the top 10% of language > control voxels/vertices (threholded at p = 0.2 for visualization purposes).

We also release (i) individual activation maps (in the MNI and FSaverage spaces), along with demographic data, and (ii) individual-level neural markers (based on the volumetric analyses), including effect sizes, voxel count (activation extent), and stability of activation across runs. The neural marker data can be used as normative distributions, based on neurotypical relatively young adults, against which any new population (e.g., children or individuals with developmental or acquired brain disorders) can be evaluated.



A total of 806 neurotypical adults (477, ~59%, female), aged 19 to 75 (441, ~55%, aged 19–29; 310, ~38%, aged 30–39; 55, ~7%, aged 40+), participated for payment between September 2007 and June 2021, as summarized in Table 1. All participants had normal or corrected-to-normal vision, and no history of neurological, developmental, or language impairments. Handedness information was collected for 758 (~94%) of the 806 participants. Of those, 707 participants (~93%) were right-handed, as determined by the Edinburgh handedness inventory56 or self-report, 38 (~5%) were left-handed, and 13 (~2%) – ambidextrous. (The participants for whom handedness is missing in the database are most likely right-handed because most of them were tested during the earlier years of data collection when right-handedness was one of the requirements for participation.) Of the 806 participants, 629 (~78%) were native speakers of English, and the remaining 177 (~22%) – native speakers of another language and proficient speakers of English (see38 for evidence that the topography of language responses for a language that an individual is proficient in is similar to that of their native language, and see SI-2 for a comparison between the atlas generated using the 629 native English speakers vs. the 177 proficient non-native speakers). Given this demographic distribution, this atlas represents certain populations better than others, and these biases should be taken into account when the data are interpreted, including in comparison to other populations.

Table 1 Participant demographics.

Each participant completed a language localizer task10 as part of one of the studies in the Fedorenko lab. Each scanning session lasted between 1 and 2 hours and included a variety of additional tasks. All participants gave informed written consent in accordance with the requirements of the MIT’s Committee on the Use of Humans as Experimental Subjects (COUHES).

Participant and session selection

The 806 scanning sessions above (one session per participant) were selected from a total of 1,065 sessions across 819 participants that were available in the Fedorenko Lab database as of June 2021. The goal was to include as many participants as possible and, for the 163 participants who performed a language localizer in multiple sessions, to select a single session with high-quality data. To assess data quality, we examined the stability of the activation topography for the language localizer contrast (see Language localizer paradigm) across runs. This analysis was performed on the data preprocessed and analyzed in the volume (i.e., SPM-based analyses; see SPM preprocessing and analysis pipeline). For 1,062 of the 1,065 sessions, we calculated voxel-wise spatial correlations in activation magnitudes the language > control contrast (see Language localizer paradigm) between the odd-numbered and even-numbered runs (the three remaining sessions consisted of a single run and were evaluated by visual inspection of the contrast maps). The correlation values were calculated within the language ‘parcels’—masks that denote typical locations of language areas. These masks (available at54 were derived from a probabilistic language atlas based on 220 participants (a subset of the participants in the current set of 806) and have been used in much past work57,58,59,60,61,62,63. Six masks (three in the frontal cortex and three in the temporal and parietal cortex) were derived from the probabilistic atlas in the left hemisphere and mirror-projected onto the right hemisphere. For each session, the correlation values were averaged across the twelve parcels, leading to a single value per session. This spatial correlation measure quantifies the stability of the activation landscape and is an objective proxy for data quality; it is affected by factors like head motion or sleepiness, but does not require subjective visual inspection of contrast maps (see SI-4 for evidence that this measure works similarly when considering all voxels vs. only voxels with positive values for the contrast of interest, suggesting that the values are not driven by the difference between responsive and non-responsive voxels). Sessions where the spatial correlation value was negative (n = 23; ~2%) were excluded, leaving 1,042 sessions across 806 participants. For the 163 participants with more than one session, we selected the session with the highest spatial correlation value for inclusion in the atlas (see11 for evidence of the stability of spatial correlation values across sessions: i.e., if a participant shows a high spatial correlation in one session, they are likely to show a high spatial correlation in another session; unpublished data replicates this result across a larger population and multiple functionally distinct networks). Following this data selection procedure, the Fisher-transformed spatial correlations of the participants’ language > control contrast were r = 0.98 and r = 0.57 for the left and right hemispheres, respectively (see11 for similar values on a subset (n = 150) of these participants).

Language localizer paradigm

Across the 806 participants, ten language localizer versions were used, as summarized in Table 2. In each version, a sentence comprehension condition was contrasted with a linguistically or acoustically degraded control condition. Visual (reading) and auditory (listening) contrasts have been previously established to engage the same fronto-temporal language network10,38,64,65. Activity in this network has further been shown to not depend on task or materials10 and to show robust effects across typologically diverse languages38. Furthermore, this network can be recovered from naturalistic task-free (resting state) data based on patterns of BOLD signal fluctuations over time33,37, and corresponds nearly perfectly to the network based on the sentences > nonwords contrast10. As a result, we pooled data from across the different localizer versions in the current study (see SI-2 for evidence that an atlas defined on only Localizer Version A, used in the majority of participants, is nearly identical to an atlas that leverages data from all other versions, and SI-3 for a supplementary analysis showing robust language > control effects across all ten versions).

Table 2 Language localizer versions.

The vast majority of participants (624, ~77.4%) performed Localizer version A – a reading version, where sentences and nonword strings are presented one word/nonword at a time at the rate of 450 ms per word/nonword, in a blocked design (with 3 sentences/nonword strings in each 18 s block). Participants were instructed to read attentively and to press a button at the end of each trial, when a picture of a finger pressing a button appeared on the screen. The experiment consisted of two ~6-minute-long runs, for a total of 16 blocks for each of the two conditions. The presentation script and stimuli for this localizer version can be downloaded at54 (for the stimuli used in the other localizer versions, contact EF). Localizer versions B-G (performed by 169 participants, ~21.0%) also used visual presentation, and Localizer versions H-J (performed by 13 participants, ~1.6%) used auditory presentation. Details of the similarities and differences in trial structure, timing, and other experimental parameters across versions are summarized in Table 2.

fMRI data acquisition

Structural and functional data were collected on the whole-body, 3 Tesla, Siemens Trio scanner with a 12-channel (G1; n = 18) or 32-channel (G2; n = 788) head coil, at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. T1-weighted structural images were collected in 176 sagittal slices with 1 mm isotropic voxels (TR = 2,530 ms, TE = 3.48 ms). Functional, blood oxygenation level dependent (BOLD), data were acquired using an EPI sequence (with a 90 degree flip angle and using GRAPPA with an acceleration factor of 2), with the following acquisition parameters: 33 (G1) or 31 (G2) 4 mm thick near-axial slices acquired in the interleaved order (with 10% distance factor), 3.0 mm × 3.0 mm (G1) or 2.1 mm × 2.1 mm (G2) in-plane resolution, FoV in the phase encoding (A  P) direction 192 mm (G1) or 200 mm (G2) and matrix size 64 mm × 64 mm (G1) or 96 mm × 96 mm (G2), TR = 2,000 ms and TE = 30 ms. Prospective acquisition correction66 was used to adjust the positions of the gradients based on the participant’s motion from the previous TR. The first 10 s of each run were excluded to allow for steady state magnetization.

SPM preprocessing and analysis pipeline


For the SPM analyses (Fig. 2 [volume]), fMRI data were analyzed using SPM12 (release 7487), CONN EvLab module (release 19b), and custom MATLAB scripts. Each participant’s functional and structural data were converted from DICOM to NIfTI format. All functional scans were coregistered and resampled using B-spline interpolation to the first scan of the first session (Friston et al.67). Potential outlier scans were identified from the resulting subject-motion estimates as well as from BOLD signal indicators using default thresholds in CONN preprocessing pipeline (5 standard deviations above the mean in global BOLD signal change, or framewise displacement values above 0.9 mm; Nieto-Castañón68). Functional and structural data were independently normalized into a common space (the MontrealNeurological Institute [MNI] template; IXI549Space) using SPM12 unified segmentation and normalization procedure (Ashburner & Friston69) with a reference functional image computed as the mean functional data after realignment across all timepoints omitting outlier scans. The output data were resampled to a common bounding box between MNI-space coordinates (−90, −126, −72) and (90, 90, 108), using 2 mm isotropic voxels and 4th order spline interpolation for the functional data, and 1 mm isotropic voxels and trilinear interpolation for the structural data. Last, the functional data were smoothed spatially using spatial convolution with a 4 mm FWHM Gaussian kernel.

Fig. 2
figure 2

Data processing flowchart. Overview of the SPM and FreeSurfer preprocessing and analysis pipelines. Raw dicom images were converted to NIfTI format, motion-corrected, mapped to a common space and smoothed during preprocessing. Each session was then modeled, t-maps were extracted and thresholded, and all sessions were aggregated to create the probabilistic atlas.

First-level analysis

Effects were estimated using a General Linear Model (GLM) in which each experimental condition was modeled with a boxcar function convolved with the canonical hemodynamic response function (HRF) (fixation was modeled implicitly, such that all timepoints that did not correspond to one of the conditions were assumed to correspond to a fixation period). Temporal autocorrelations in the BOLD signal timeseries were accounted for by a combination of high-pass filtering with a 128 seconds cutoff, and whitening using an AR(0.2) model (first-order autoregressive model linearized around the coefficient a = 0.2) to approximate the observed covariance of the functional data in the context of Restricted Maximum Likelihood estimation (ReML). In addition to experimental condition effects, the GLM design included first-order temporal derivatives for each condition (included to model variability in the HRF delays), as well as nuisance regressors to control for the effect of slow linear drifts, subject-motion parameters, and potential outlier scans on the BOLD signal.

FreeSurfer preprocessing and analysis pipeline

For the FreeSurfer analysis (Fig. 2 [surface]), fMRI data were analyzed using FreeSurfer v6.0.0. Each participant’s functional and structural data were converted from DICOM to NIfTI format using the default unpacksdcmdir parameters. (Two of the 806 participants could not be included in this pipeline because their raw dicom files were lost, leaving 804 participants for this analysis.) The raw data were then sampled onto both hemispheres of the FSaverage surface, motion corrected and registered using the middle time point of each run. The data were then smoothed spatially with a 4 mm FWHM Gaussian filter. For the first-level analyses, effects were estimated using a GLM in which each condition was modeled with a first order polynomial regressor fitting the canonical HRF. The GLM also included nuisance regressors for offline-estimated subject-motion parameters.

Atlas creation


Using custom code (available at OSF70), we computed the overlap of the individual activation maps for the language > control contrast using the 806 participants analyzed in the SPM12 pipeline (see SI-5 for evidence that the atlas reaches stability at sample sizes much smaller than 806, which suggests that the current sample size is sufficient to be generalizable). In particular, we used whole-brain t-maps that were generated by the first-level analysis and that contain a t-value for the relevant contrast in each voxel (a post-hoc analysis compared the whole-brain t-maps to their corresponding unscaled contrast maps and found strong voxel-wise correlations over the set of 806 participants: r = 0.93 ± 0.03; see SI-2 for evidence that atlases generated from t-maps vs. contrast maps are highly similar). In each individual map, we selected the 10% of voxels across the brain with the highest t-values for the language > control contrast (average and median minimum t-values across participants were 1.73 and 1.62, respectively; average and median maximum t-values were 13.8 and 13.7, respectively). These maps were then binarized so that the selected voxels got assigned a value of 1 and the remaining voxels—a value of 0. Finally, these values were averaged across participants in each voxel. The resulting atlas contains in each voxel a value between 0 and 1, which corresponds to the proportion of the 806 participants for whom that voxel falls in the 10% of voxels across the brain with the highest t-values for the language > control contrast. In the left hemisphere, these values range from 0 to 0.82, and in the RH—from 0 to 0.64 (the values are lower in the RH presumably because the majority of the selected voxels fall in the LH: across participants, average and median proportions of selected voxels falling in the LH are 58.3% and 57.8%, respectively). For more details on ROI-level probability values, see SI-6.


Using custom code (available at OSF70), we computed the overlap of the individual activation maps for the language > control contrast, using the 804 participants analyzed in the FreeSurfer pipeline. The procedure was similar to that used for the SPM-based atlas, except that the selection of the highest t-values was performed on the surface vertices. To maintain hemispheric asymmetries, rather than evaluating each hemisphere separately, as is generally common for FreeSurfer analyses, the top 10% of vertices were selected from the vertices pooled across the LH and RH, as for the SPM-based atlas. For this atlas, in the left hemisphere, the proportion values range from 0 to 0.90, and in the RH—from 0 to 0.80 (these values are expectedly higher than those in the SPM-based atlas given the superiority of surface-based inter-individual alignment71).


We chose the top 10% approach over an approach where each individual map is thresholded at a fixed t-value (as in10) to account for inter-individual variability in the overall strength of BOLD signal responses due to trait or state factors72,73,74,75. However, because differences in the language network size may correspond to differences in linguistic experience or ability76, we additionally provide versions of the atlas that are derived from t-maps that are thresholded at p < 0.001, p < 0.01, or p < 0.05 ( Atlases based on the fixed t-value thresholding approach yield topographies that are very similar to the one based on the top 10% approach (see SI-2). The critical difference between these versions of the atlas is in the interpretation of the overlap values: whereas, as noted above, in the top 10% approach, the overlap values correspond to the proportion of the 806 participants for whom that voxel falls in the 10% of voxels across the brain with the highest t-values for the language > control contrast, in the atlases based on the fixed-thresholding approach, the overlap values correspond to the proportion of the 806 participants for whom that voxel is significant for the language > control contrast at the relevant threshold.

Note that in addition to the classic left frontal and left temporal areas (and their right-hemispheric homotopes), several other areas emerge in the atlas, including in the right cerebellum and in the visual cortex. These less canonical areas have been reported in past language studies (e.g.77,78), but we acknowledge that in general, these have not been as thoroughly functionally characterized as the core frontal and temporal areas, and may in future work be shown not to be selective and/or critical for language function.

Finally, one might ask: how similar is the topography of a probabilistic functional atlas to that of a random-effects group map for the same data. Of course, these are expected to be correlated given that voxels which are task-responsive in a greater portion of the participant population (i.e., have higher probability overlap values in the atlas) are likely to yield higher t-values in the voxel-wise t-tests (see SI-7 for this comparison for LanA). The critical advantage of the probabilistic functional atlas like LanA over a random-effects map is the straightforward interpretation of the voxel values that it affords, in terms of the probability that the voxel belongs to the relevant functional area/network (the language network in this case). Such information cannot be inferred from t-values in a random-effect map without additional assumptions/mapping functions.

Neural markers

In addition to the population-level atlases, we also provide a set of individual-level neural markers (based on the volumetric SPM analyses) for the language network in each participant. These neural markers include: effect size, voxel count, and spatial correlation (additional information on these markers provided below). All of these markers have all been shown to be reliable within individuals over time, including across scanning sessions11. We provide each of these measures for each of the ROIs constrained by the previously defined language ‘parcels’ (available at54, which include in each hemisphere three frontal parcels (inferior frontal gyrus [IFG], its orbital portion [IFGorb], and middle frontal gyrus [MFG]) and three temporal/parietal ones (anterior temporal [AntTemp], posterior temporal [PostTemp], and angular gyrus [AngG]), for a total of 12 parcels. Of the 806 participants included in the atlas, only 803 completed the 2 or more runs, as needed to calculate the effect size and spatial correlation markers; for the remaining 3 participants, only voxel count is provided.

Effect size was operationalized as the magnitude (% BOLD signal change) of the critical language > control contrast. Within each parcel, we defined—for each participant—a functional ROI (fROI) by selecting 10% of the mask’s total voxels with the highest t-values for the language > control contrast using all but one run of the data. We then extracted from the left-out run the responses to the language and control conditions and computed the language > control difference. This procedure was repeated across all run partitions. This across-runs cross-validation procedure3 ensures independence between the data used to define the fROIs and estimate their responses79. In the final step, the estimates were averaged across the cross-validation folds to derive a single value per participant per fROI. Voxel count (activation extent) was defined as the number of significant voxels for the critical language > control contrast at a fixed statistical threshold (p < 0.001 uncorrected threshold). Spatial correlation (stability of the activation landscape) was defined—for the voxels falling within the language parcels—as the Fisher-transformed Pearson correlation coefficient between the voxel responses for the language > control contrast across odd- and even-numbered runs. As noted above, for all three measures, we provide 14 values per participant: one for each of the 12 ROIs (6 in each hemisphere), and two additional per-hemisphere values (averaging across the 6 ROIs in each hemisphere). See Table 3 for a summary of these neural markers within the atlas population. Additional measures can be computed based on the measures we provide (e.g., lateralization can be computed from the voxel counts80), and other measures can be extracted from the whole-brain activation maps (see Data records).

Table 3 Neural marker distributions.

These different measures can be explored with respect to each other, or to the demographic variables (but see81 for a discussion about the prevalence of underpowered brain-behavior individual differences studies). These measures can also serve as normative distributions against which any new population can be evaluated, including children or individuals with developmental and acquired brain disorders, or otherwise atypical brains82.

Data Records

The full dataset, including the SPM and FreeSurfer atlases are available for download83 ( Along with the atlases, we make available i) individual contrast and significance maps (for both the volume-based SPM and the surface-based FreeSurfer pipelines; because we had not obtained consent for raw data release, we cannot make publicly available the raw dicom/NIfTI images), and ii) a dataset of individual neural markers.

The complete dataset can additionally be accessed at via the prepackaged download links. The ‘Download SPM Atlas’ and ‘Download FS Atlas’ links provide a copy of the language atlas in their respective formats. The SPM atlas is a single volumetric NIfTI file, whereas the FS atlas is comprised of two overlay NIfTI files, one for each hemisphere. Under ‘Download All SPM Data’ and ‘Download All FS Data’, each of the individual participant’s data can be downloaded. In particular, for each of the 806 participants (804 for FS), we provide a ‘Demograhics_&_Summary.txt’ file, which contains relevant information as in Tables 2 and 3, the individual contrast and significance maps, and a visualization of their individual activation profile in the selected template space.

As well as allowing the user to download the data, the LanA website offers opportunities for online exploration and the retrieval of relevant subsets of data. In particular, individual activation maps can be explored under the ‘Explore Activation Maps’ tab, and relevant neural markers can be explored under the ‘Explore Neural Markers’ tab. In addition, data can be filtered by demographic variables including, age, gender, handedness, native English speaker status, language network lateralization, etc., and these subsets can be downloaded, or their maps/neural markers can be explored. This flexible tool allows individual users to access relevant data for their needs without the requirement for offline filtering.

Finally, at54, we provide a version of the language localizer experiment (Localizer Version A, which is used for the majority of participants) for download.

Technical Validation

The individual participants’ data quality check was performed as described in the Participants and session selection section.

Individual localizer versions were evaluated to confirm they each elicited a strong language > control effect, as described in SI-3.

The atlas creation process was evaluated with respect to several hyperparameter choices, and the atlas remained robust to each decision, including the inclusion of non-native but proficient English speakers, different localizer versions, the use of whole-brain maps based on t-values vs. contrast values, and definition of the language system as the top 10% of language > control voxels vs. as language > control voxels that pass a specific significance threshold. We summarize the (minimal) impact of all these choices in SI-2.

Finally, in SI-1, we demonstrate that group-level ROIs defined based on the highest-overlap voxels in LanA outperform commonly used Glasser ROIs derived from multi-modal Human Connectome Project data55 in effect size estimation. The latter grossly underestimate effect sizes, especially for the frontal language areas. Of course, as expected3, individual-level language fROIs are still the best for accurately estimating effect sizes, and these outperform the group-based LanA fROIs, but in cases where individual localization may not be possible (e.g., in retroactively re-analyzing past studies), LanA-based group ROIs are recommended, as they fare substantially better than Glasser ROIs.

Usage Notes

The data records presented in this paper, including materials for download and exploration at the are available for free and fair use to individual and academic researchers, institutions, and entities provided that this work is appropriately referenced. Although this atlas has potential for clinical applications, the authors assume no responsibility for the use or misuse of LanA and associated data records in clinical and other settings.