Semantic reconstruction of continuous language from non-invasive brain recordings

Tang, Jerry; LeBel, Amanda; Jain, Shailee; Huth, Alexander G.

doi:10.1038/s41593-023-01304-9

Article
Published: 01 May 2023

Semantic reconstruction of continuous language from non-invasive brain recordings

Nature Neuroscience volume 26, pages 858–866 (2023)Cite this article

62k Accesses
58 Citations
4734 Altmetric
Metrics details

Subjects

Abstract

A brain–computer interface that decodes continuous language from non-invasive recordings would have many scientific and practical applications. Currently, however, non-invasive language decoders can only identify stimuli from among a small set of words or phrases. Here we introduce a non-invasive decoder that reconstructs continuous language from cortical semantic representations recorded using functional magnetic resonance imaging (fMRI). Given novel brain recordings, this decoder generates intelligible word sequences that recover the meaning of perceived speech, imagined speech and even silent videos, demonstrating that a single decoder can be applied to a range of tasks. We tested the decoder across cortex and found that continuous language can be separately decoded from multiple regions. As brain–computer interfaces should respect mental privacy, we tested whether successful decoding requires subject cooperation and found that subject cooperation is required both to train and to apply the decoder. Our findings demonstrate the viability of non-invasive language brain–computer interfaces.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Decoding across cortical regions.**

**Fig. 3: Decoder applications and privacy implications.**

The language network as a natural kind within the broader landscape of the human brain

Article 12 April 2024

brainlife.io: a decentralized and open-source cloud platform to support neuroscience research

Article Open access 11 April 2024

A neural speech decoding framework leveraging deep learning and speech synthesis

Article Open access 08 April 2024

Data availability

Data collected during the decoder resistance experiment are available upon reasonable request but were not publicly released due to concern that the data could be used to discover ways to bypass subject resistance. All other data are available at https://openneuro.org/datasets/ds003020 and https://openneuro.org/datasets/ds004510.

Code availability

Custom decoding code is available at https://github.com/HuthLab/semantic-decoding.

References

Anumanchipalli, G. K., Chartier, J. & Chang, E. F. Speech synthesis from neural decoding of spoken sentences. Nature 568, 493–498 (2019).
Article CAS PubMed PubMed Central Google Scholar
Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012).
Article CAS PubMed PubMed Central Google Scholar
Willett, F. R., Avansino, D. T., Hochberg, L. R., Henderson, J. M. & Shenoy, K. V. High-performance brain-to-text communication via handwriting. Nature 593, 249–254 (2021).
Article CAS PubMed PubMed Central Google Scholar
Moses, D. A. et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N. Engl. J. Med. 385, 217–227 (2021).
Article PubMed PubMed Central Google Scholar
Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E. & Gallant, J. L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016).
Article PubMed PubMed Central Google Scholar
de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L. & Theunissen, F. E. The hierarchical cortical organization of human speech processing. J. Neurosci. 37, 6539–6557 (2017).
Article PubMed PubMed Central Google Scholar
Broderick, M. P., Anderson, A. J., Di Liberto, G. M., Crosse, M. J. & Lalor, E. C. Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech. Curr. Biol. 28, 803–809 (2018).
Article CAS PubMed Google Scholar
Caucheteux, C. & King, J.-R. Brains and algorithms partially converge in natural language processing. Commun. Biol. 5, 134 (2022).
Article PubMed PubMed Central Google Scholar
Farwell, L. A. & Donchin, E. Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalogr. Clin. Neurophysiol. 70, 510–523 (1988).
Article CAS PubMed Google Scholar
Mitchell, T. M. et al. Predicting human brain activity associated with the meanings of nouns. Science 320, 1191–1195 (2008).
Article CAS PubMed Google Scholar
Pereira, F. et al. Toward a universal decoder of linguistic meaning from brain activation. Nat. Commun. 9, 963 (2018).
Article PubMed PubMed Central Google Scholar
Dash, D., Ferrari, P. & Wang, J. Decoding imagined and spoken phrases from non-invasive neural (MEG) signals. Front. Neurosci. 14, 290 (2020).
Article PubMed PubMed Central Google Scholar
Logothetis, N. K. The underpinnings of the BOLD functional magnetic resonance imaging signal. J. Neurosci. 23, 3963–3971 (2003).
Article CAS PubMed PubMed Central Google Scholar
Jain, S. & Huth, A. G. Incorporating context into language encoding models for fMRI. In Advances in Neural Information Processing Systems 31 6629–6638 (NeurIPS, 2018).
Toneva, M. & Wehbe, L. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). In Advances in Neural Information Processing Systems 32 14928–14938 (NeurIPS, 2019).
Schrimpf, M. et al. The neural architecture of language: integrative modeling converges on predictive processing. Proc. Natl Acad. Sci. USA 118, e2105646118 (2021).
Article CAS PubMed PubMed Central Google Scholar
LeBel, A., Jain, S. & Huth, A. G. Voxelwise encoding models show that cerebellar language representations are highly conceptual. J. Neurosci. 41, 10341–10355 (2021).
Article CAS PubMed PubMed Central Google Scholar
Naselaris, T., Prenger, R. J., Kay, K. N., Oliver, M. & Gallant, J. L. Bayesian reconstruction of natural images from human brain activity. Neuron 63, 902–915 (2009).
Article CAS PubMed PubMed Central Google Scholar
Nishimoto, S. et al. Reconstructing visual experiences from brain activity evoked by natural movies. Curr. Biol. 21, 1641–1646 (2011).
Article CAS PubMed PubMed Central Google Scholar
Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training. Preprint at OpenAI https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (2018).
Tillmann, C. & Ney, H. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Comput. Linguist. 29, 97–133 (2003).
Article Google Scholar
Lerner, Y., Honey, C. J., Silbert, L. J. & Hasson, U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).
Article CAS PubMed PubMed Central Google Scholar
Binder, J. R. & Desai, R. H. The neurobiology of semantic memory. Trends Cogn. Sci. 15, 527–536 (2011).
Article PubMed PubMed Central Google Scholar
Deniz, F., Nunez-Elizalde, A. O., Huth, A. G. & Gallant, J. L. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. J. Neurosci. 39, 7722–7736 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gauthier, J. & Ivanova, A. Does the brain represent words? An evaluation of brain decoding studies of language understanding. In 2018 Conference on Cognitive Computational Neuroscience 1–4 (CCN, 2018).
Fedorenko, E. & Thompson-Schill, S. L. Reworking the language network. Trends Cogn. Sci. 18, 120–126 (2014).
Article PubMed PubMed Central Google Scholar
Fodor, J. A. The Modularity of Mind (MIT Press, 1983).
Keller, T. A., Carpenter, P. A. & Just, M. A. The neural bases of sentence comprehension: a fMRI examination of syntactic and lexical processing. Cereb. Cortex 11, 223–237 (2001).
Article CAS PubMed Google Scholar
Geschwind, N. The organization of language and the brain. Science 170, 940–944 (1970).
Article CAS PubMed Google Scholar
Barsalou, L. W. Grounded cognition. Annu. Rev. Psychol. 59, 617–645 (2008).
Article PubMed Google Scholar
Bunzeck, N., Wuestenberg, T., Lutz, K., Heinze, H.-J. & Jancke, L. Scanning silence: mental imagery of complex sounds. Neuroimage 26, 1119–1127 (2005).
Article PubMed Google Scholar
Martin, S. et al. Decoding spectrotemporal features of overt and covert speech from the human cortex. Front. Neuroeng. 7, 14 (2014).
Article PubMed PubMed Central Google Scholar
Naselaris, T., Olman, C. A., Stansbury, D. E., Ugurbil, K. & Gallant, J. L. A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes. Neuroimage 105, 215–228 (2015).
Article PubMed Google Scholar
Silbert, L. J., Honey, C. J., Simony, E., Poeppel, D. & Hasson, U. Coupled neural systems underlie the production and comprehension of naturalistic narrative speech. Proc. Natl Acad. Sci. USA 111, E4687–E4696 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fairhall, S. L. & Caramazza, A. Brain regions that represent amodal conceptual knowledge. J. Neurosci. 33, 10552–10558 (2013).
Article CAS PubMed PubMed Central Google Scholar
Popham, S. F. et al. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nat. Neurosci. 24, 1628–1636 (2021).
Article CAS PubMed Google Scholar
Çukur, T., Nishimoto, S., Huth, A. G. & Gallant, J. L. Attention during natural vision warps semantic representation across the human brain. Nat. Neurosci. 16, 763–770 (2013).
Article PubMed PubMed Central Google Scholar
Kiremitçi, I. et al. Attentional modulation of hierarchical speech representations in a multitalker environment. Cereb. Cortex 31, 4986–5005 (2021).
Article PubMed PubMed Central Google Scholar
Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012).
Article CAS PubMed Google Scholar
Horikawa, T. & Kamitani, Y. Attention modulates neural representation to render reconstructions according to subjective appearance. Commun. Biol. 5, 34 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rainey, S., Martin, S., Christen, A., Mégevand, P. & Fourneret, E. Brain recording, mind-reading, and neurotechnology: ethical issues from consumer devices to brain-based speech decoding. Sci. Eng. Ethics 26, 2295–2311 (2020).
Article PubMed PubMed Central Google Scholar
Kaplan, J. et al. Scaling laws for neural language models. Preprint at arxiv https://doi.org/10.48550/arXiv.2001.08361 (2020).
White, B. R. & Culver, J. P. Quantitative evaluation of high-density diffuse optical tomography: in vivo resolution and mapping performance. J. Biomed. Opt. 15, 026006 (2010).
Article PubMed PubMed Central Google Scholar
Eggebrecht, A. T. et al. A quantitative spatial comparison of high-density diffuse optical tomography and fMRI cortical mapping. Neuroimage 61, 1120–1128 (2012).
Article PubMed Google Scholar
Makin, J. G., Moses, D. A. & Chang, E. F. Machine translation of cortical activity to text with an encoder–decoder framework. Nat. Neurosci. 23, 575–582 (2020).
Article CAS PubMed Google Scholar
Orsborn, A. L. et al. Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control. Neuron 82, 1380–1393 (2014).
Article CAS PubMed Google Scholar
Goering, S. et al. Recommendations for responsible development and application of neurotechnologies. Neuroethics 14, 365–386 (2021).
Article PubMed PubMed Central Google Scholar
Levy, C. Sintel (Blender Foundation, 2010).
Fedorenko, E., Hsieh, P.-J., Nieto-Castañón, A., Whitfield-Gabrieli, S. & Kanwisher, N. New method for fMRI investigations of language: defining ROIs functionally in individual subjects. J. Neurophysiol. 104, 1177–1194 (2010).
Article PubMed PubMed Central Google Scholar
Yuan, J. & Liberman, M. Speaker identification on the SCOTUS corpus. J. Acoust. Soc. Am. 123, 3878 (2008).
Article Google Scholar
Boersma, P. & Weenink, D. Praat: doing phonetics by computer (University of Amsterdam, 2014).
Casarosa, E. La Luna (Walt Disney Pictures; Pixar Animation Studios, 2011).
Sweetland, D. Presto (Walt Disney Pictures; Pixar Animation Studios, 2008).
Sohn, P. Partly Cloudy (Walt Disney Pictures; Pixar Animation Studios, 2009).
Jenkinson, M. & Smith, S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 5, 143–156 (2001).
Article CAS PubMed Google Scholar
Dale, A. M., Fischl, B. & Sereno, M. I. Cortical surface-based analysis. I. Segmentation and surface reconstruction. Neuroimage 9, 179–194 (1999).
Article CAS PubMed Google Scholar
Gao, J. S., Huth, A. G., Lescroart, M. D. & Gallant, J. L. Pycortex: an interactive surface visualizer for fMRI. Front. Neuroinform. 9, 23 (2015).
Article PubMed PubMed Central Google Scholar
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Article CAS PubMed PubMed Central Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 8024–8035 (NeurIPS, 2019).
Wolf, T. et al. Transformers: state-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 38–45 (Association for Computational Linguistics, 2020).
Holtzman, A., Buys, J., Du, L., Forbes, M. & Choi, Y. The curious case of neural text degeneration. In 8th International Conference on Learning Representations 1–16 (ICLR, 2020).
Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 311–318 (Association for Computational Linguistics, 2002).
Banerjee, S. & Lavie, A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization 65–72 (Association for Computational Linguistics, 2005).
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. & Artzi, Y. BERTScore: evaluating text generation with BERT. In 8th International Conference on Learning Representations 1–43 (ICLR, 2020).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57, 289–300 (1995).
Google Scholar
Faul, F., Erdfelder, E., Lang, A.-G. & Buchner, A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191 (2007).
Article PubMed Google Scholar
Pennington, J., Socher, R. & Manning, C. D. GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing 1532–1543 (Association for Computational Linguistics, 2014).
Warriner, A. B., Kuperman, V. & Brysbaert, M. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Methods 45, 1191–1207 (2013).
Article PubMed Google Scholar
Brysbaert, M., Warriner, A. B. & Kuperman, V. Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46, 904–911 (2014).
Article PubMed Google Scholar
Levy, R. Expectation-based syntactic comprehension. Cognition 106, 1126–1177 (2008).
Article PubMed Google Scholar
Fischl, B., Sereno, M. I., Tootell, R. B. H. & Dale, A. M. High-resolution intersubject averaging and a coordinate system for the cortical surface. Hum. Brain Mapp. 8, 272–284 (1999).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank J. Wang, X. X. Wei and L. Hamilton for comments on the manuscript and A. Arcot for writing answers to the behavioral comprehension questions. This work was supported by the National Institute on Deafness and Other Communication Disorders under award number 1R01DC020088-001 (A.G.H.), the Whitehall Foundation (A.G.H.), the Alfred P. Sloan Foundation (A.G.H.) and the Burroughs Wellcome Fund (A.G.H.).

Author information

Authors and Affiliations

Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
Jerry Tang, Shailee Jain & Alexander G. Huth
Department of Neuroscience, The University of Texas at Austin, Austin, TX, USA
Amanda LeBel & Alexander G. Huth

Authors

Jerry Tang
View author publications
You can also search for this author in PubMed Google Scholar
Amanda LeBel
View author publications
You can also search for this author in PubMed Google Scholar
Shailee Jain
View author publications
You can also search for this author in PubMed Google Scholar
Alexander G. Huth
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: J.T. and A.G.H.; Methodology: J.T.; Software and resources: J.T. and S.J.; Investigation and data curation: J.T. and A.L.; Formal analysis and visualization: J.T.; Writing (original draft): J.T.; Writing (review and editing): J.T., A.L., S.J. and A.G.H.; Supervision: A.G.H.

Corresponding author

Correspondence to Alexander G. Huth.

Ethics declarations

Competing interests

A.G.H. and J.T. are inventors on a pending patent application (the applicant is The University of Texas System) that is directly relevant to the language decoding approach used in this work. All other authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Gregory Cogan, Stephen David and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Encoding model and word rate model performance.

The two decoder components that interface with fMRI data are the encoding model and the word rate model. (a) Encoding models were evaluated by predicting brain responses to the perceived speech test story and computing the linear correlation between the predicted responses and the actual single-trial responses. Correlations for subject S3 were projected onto a cortical flatmap. The encoding model successfully predicted brain responses in most cortical regions outside of primary sensory and motor areas. (b) Encoding models were trained on different amounts of data. To summarize encoding model performance across cortex, correlations were averaged across the 10,000 voxels used for decoding. Encoding model performance increased with the amount of training data collected from each subject. (c) Encoding models were tested on brain responses that were averaged across different repeats of the perceived speech test story to artificially increase the signal-to-noise ratio (SNR). Encoding model performance increased with the number of averaged responses. (d) Word rate models were trained on different amounts of data. Word rate models were evaluated by predicting the word rate of a test story and computing the linear correlation between the predicted and the actual word rate vectors. Word rate model performance slightly increased with the amount of training data collected from each subject. (e) For brain responses to perceived speech, word rate models fit on auditory cortex significantly outperformed word rate models fit on frontal speech production areas or randomly sampled voxels (* indicates q(FDR) < 0.05 across n = 3 subjects, two-sided paired t-test). (f) For brain responses to imagined speech, there were no significant differences in performance for word rate models fit on different cortical regions. For all results, black lines indicate the mean across subjects and error bars indicate the standard error of the mean (n = 3).

Extended Data Fig. 2 Perceived and imagined speech identification performance.

Language decoders were trained for subjects S1 and S2 on fMRI responses recorded while the subjects listened to narrative stories. (a) The decoders were evaluated on single-trial fMRI responses recorded while the subjects listened to the perceived speech test story. The color at (i, j) reflects the BERTScore similarity between the ith second of the decoder prediction and the jth second of the actual stimulus. Identification accuracy was significantly higher than expected by chance (P < 0.05, one-sided permutation test). Corresponding results for subject S3 are shown in Fig. 1f in the main text. (b) The decoders were evaluated on single-trial fMRI responses recorded while the subjects imagined telling five 1-minute test stories twice. Decoder predictions were compared to reference transcripts that were separately recorded from the same subjects. Each row corresponds to a scan, and the colors reflect the similarities between the decoder prediction and all five reference transcripts. For each scan, the decoder prediction was most similar to the reference transcript of the correct story (100% identification accuracy). Corresponding results for subject S3 are shown in Fig. 3a in the main text.

Extended Data Fig. 3 Behavioral assessment of decoder predictions.

Four 80 s segments were chosen from the perceived speech test story. For each segment, four multiple-choice questions were written based on the actual stimulus words without looking at the decoder predictions (Supplementary Table 7). 100 subjects were recruited for an online behavioral experiment and randomly assigned to experimental and control groups. For each segment, the experimental group subjects answered the questions after reading the decoded words from subject S3, while the control group subjects answered the questions after reading the actual stimulus words (see Methods). (a) Experimental group scores were significantly higher than expected by chance for 9 out of the 16 questions (* indicates q(FDR) < 0.05, two-sided binomial test). Error bars indicate the bootstrap standard error (n = 1,000 samples). (b) The decoded words and the actual stimulus words for a segment. (c) The multiple-choice questions cover different aspects of the stimulus story.

Extended Data Fig. 4 Decoding across cortical regions.

Cortical regions for subjects S1 and S2. (a) Brain data used for decoding (colored regions) were partitioned into the speech network, the parietal-temporal-occipital association region, and the prefrontal region (PFC). (b) Decoding performance time-course for the perceived speech test story from each region. Horizontal lines indicate when decoder predictions were significantly more similar to the actual stimulus words than expected by chance under the BERTScore metric (q(FDR) < 0.05, one-sided nonparametric test). Corresponding results for subject S3 are shown in Fig. 2a,c in the main text.

Extended Data Fig. 5 Comparison of decoding performance across experiments.

Decoder predictions from different experiments were compared based on the fraction of significantly decoded time-points under the BERTScore metric (q(FDR) < 0.05). The fraction of significantly decoded time-points was used because it does not depend on the length of the stimuli. (a) The decoder successfully recovered 72–82% of time-points during perceived speech, 41–74% of time-points during imagined speech, and 21–45% of time-points during perceived movies. (b) During a multi-speaker stimulus, the decoder successfully recovered 42–68% of time-points told by the female speaker when subjects attended to the female speaker, 0–1% of time-points told by the female speaker when subjects attended to the male speaker, 63–75% of time-points told by the male speaker when subjects attended to the male speaker, and 0–3% of time-points told by the male speaker when subjects attended to the female speaker. (c) During a perceived story, within-subject decoders successfully recovered 65–82% of time-points, volumetric cross-subject decoders successfully recovered 1–2% of time-points, and surface-based cross-subject decoders successfully recovered 1–5% of time-points. (d) During a perceived story, within-subject decoders successfully recovered 52–57% of time-points when subjects passively listened, 4–50% of time-points when subjects resisted by counting by sevens, 0–3% of time-points when subjects resisted by naming animals, and 1–26% of time-points when subjects resisted by imagining a different story.

Extended Data Fig. 6 Cross-subject encoding model and word rate model performance.

For each subject, encoding models and word rate models were trained on anatomically aligned brain responses from 5 sets of other subjects (indicated by markers). The models were evaluated on within-subject single-trial responses to the perceived speech test story. (a) Cross-subject encoding models performed significantly worse than within-subject encoding models (* indicates q(FDR) < 0.05, two-sided t-test). (b) Cross-subject word rate models performed significantly worse than within-subject word rate models (* indicates q(FDR) < 0.05, two-sided t-test).

Extended Data Fig. 7 Decoding performance as a function of training data.

Decoders were trained on different amounts of data and evaluated on the perceived speech test story. (a) The fraction of significantly decoded time-points increased with the amount of training data collected from each subject but plateaued after 7 scanning sessions (7.5 h) and did not substantially increase up to 15 sessions (16 h). The substantial increase up to 7 scanning sessions suggests that decoders can recover certain semantic concepts after training on a small amount of data, but require much more training data to achieve consistently good performance across the test story. (b) The mean identification percentile rank increased with the amount of training data collected from each subject but plateaued after 7 scanning sessions (7.5 h) and did not substantially increase up to 15 sessions (16 h). For all results, black lines indicate the mean across subjects and error bars indicate the standard error of the mean (n = 3).

Extended Data Fig. 8 Decoding performance at lower spatial resolutions.

While fMRI provides high spatial resolution, current MRI scanners are too large and expensive for most practical decoder applications. Portable alternatives like functional near-infrared spectroscopy (fNIRS) measure the same hemodynamic activity as fMRI, albeit at a lower spatial resolution. To simulate how the decoder would perform at lower spatial resolutions, fMRI data were spatially smoothed using Gaussian kernels with standard deviations of 1, 2, 3, 4, and 5 voxels, corresponding to 6.1, 12.2, 18.4, 24.5, and 30.6 mm full width at half maximum (FWHM). The encoding model, noise model, and word rate model were estimated on spatially smoothed training data, and the decoder was evaluated on spatially smoothed responses to the perceived speech test story. (a) fMRI images for each subject were spatially smoothed using progressively larger Gaussian kernels. Blue voxels have above average activity and red voxels have below average activity. (b) Story similarity decreased as the data were spatially smoothed, but remained high at moderate levels of smoothing. (c) The fraction of significantly decoded time-points decreased as the data were spatially smoothed, but remained high at moderate levels of smoothing. (d) Encoding model prediction performance increased as the data were spatially smoothed, demonstrating that decoding performance and encoding model performance are not perfectly coupled. While spatial smoothing reduces information, making it harder to decode the stimulus, it also reduces noise, making it easier to predict the responses. For all results, black lines indicate the mean across subjects and error bars indicate the standard error of the mean (n = 3). Dashed gray lines indicate the estimated spatial resolution of current portable systems⁴³. These results show that around 50% of the stimulus time-points could still be decoded at the estimated spatial resolution of current portable systems, and provide a benchmark for how much portable systems need to improve to reach different levels of decoding performance.

Extended Data Fig. 9 Decoder ablations.

To decode new words, the decoder uses both the autoregressive context (that is the previously decoded words) and the fMRI data. To understand the relative contributions of the autoregressive context and the fMRI data, decoders were evaluated in the absence of each component. The standard decoding approach was performed up to a cutoff point in the perceived speech test story. After the cutoff, either the autoregressive context was reset or the fMRI data were removed. To reset the autoregressive context, all of the candidate sequences were discarded and the beam was re-initialized with an empty sequence. The standard decoding approach was then performed for the remainder of the scan. To remove the fMRI data, continuations were assigned random likelihoods rather than encoding model likelihoods for the remainder of the scan. (a) A cutoff point was defined 300 s into the stimulus for one subject. When the autoregressive context was reset, decoding performance fell but quickly rebounded. When the fMRI data were removed, decoding performance quickly fell to chance level. The gray shaded region indicates the 5th to 95th percentiles of the null distribution. (b) The ablations were repeated for cutoff points at every 50 s of the stimulus. The performance differences between the original decoder and the ablated decoders were averaged across cutoff points and subjects, yielding profiles of how decoding performance changes after each component is ablated. The blue and purple shaded regions indicate the standard error of the mean (n = 27 trials). These results demonstrate that the decoder continually relies on the encoding model and the fMRI data to achieve good performance, and does not require good initial context. In these figures, each time-point was scored based on the 20 s window ending at that time-point, whereas in all other figures, each time-point was scored based on the 20 s window centered around that time-point. This shifted indexing scheme emphasizes how decoding performance changes after a cutoff. Dashed gray lines indicate cutoff points.

Extended Data Fig. 10 Isolated encoding model and language model scores.

The encoding model and the language model were separately evaluated on the perceived speech test story to isolate their contributions to the decoding error (see Methods). At each word time t, the encoding model and the language model were provided with the actual stimulus word and 100 random distractor words. The encoding model ranks the words by the likelihood of the fMRI responses, and the language model ranks the words by the probability given the previous stimulus words. Encoding model and language model scores were computed based on the number of distractor words ranked below the actual word (100 indicates perfect performance, 50 indicates chance level performance). To compare against the decoding scores from Fig. 1e, the word-level scores were averaged across 20 s windows of the stimulus. (a) Encoding model scores were significantly correlated with decoding scores (linear correlation r = 0.22–0.58, P < 0.05), suggesting that many of the poorly decoded time-points in Fig. 1e are inherently more difficult to decode using the encoding model. (b) Language model scores were not significantly correlated with decoding scores. (c) For each word, encoding model scores from 10 sets of distractors were compared to chance level. Most stimulus words with significant encoding model scores (q(FDR) < 0.05, two-sided t-test) for the whole brain also had significant encoding model scores for the speech network (80–87%), association region (88–92%), and prefrontal region (82–85%), suggesting that the results in Fig. 2c were not primarily due to the language model. Word-level encoding model scores were significantly correlated across each pair of regions (q(FDR) < 0.05, two-sided permutation test), suggesting that the results in Fig. 2d were not primarily due to the language model. (d) Word-level encoding model and language model scores were correlated against the word properties tested in Fig. 4d (* indicates q(FDR) < 0.05 for all subjects, two-sided permutation test). The encoding model and the language model were biased in opposite directions for several word properties. These effects may have balanced out in the full decoder, leading to the observed lack of correlation between the word properties and decoding scores (Fig. 4d).

Supplementary information

Supplementary Information

Supplementary Fig. 1 and Tables 1–7.

Reporting Summary

Supplementary Video

Single-trial BOLD fMRI responses were recorded and decoded while subject S3 watched a self-contained clip from the short film ‘Sintel’ without sound (Blender Foundation; https://www.sintel.org (ref. ⁴⁸)). Each frame is shown at the time it was presented to the subject, and each decoded word is shown at its predicted time.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tang, J., LeBel, A., Jain, S. et al. Semantic reconstruction of continuous language from non-invasive brain recordings. Nat Neurosci 26, 858–866 (2023). https://doi.org/10.1038/s41593-023-01304-9

Download citation

Received: 01 April 2022
Accepted: 15 March 2023
Published: 01 May 2023
Issue Date: May 2023
DOI: https://doi.org/10.1038/s41593-023-01304-9

This article is cited by

Neural signatures of emotion regulation
- Jared Rieck
- Julia Wrobel
- Joshua L. Gowin
Scientific Reports (2024)
Disclosing Results of Tests for Covert Consciousness: A Framework for Ethical Translation
- Michael J. Young
- Karnig Kazazian
- Brian L. Edlow
Neurocritical Care (2024)
Neurotechnologies, Ethics, and the Limits of Free Will
- Laurynas Adomaitis
- Alexei Grinbaum
Integrative Psychological and Behavioral Science (2024)
The Age of the Intelligent Machine: Singularity, Efficiency, and Existential Peril
- Alexander Amigud
Philosophy & Technology (2024)
Embracing digital innovation in neuroscience: 2023 in review at NEUROCCINO
- Eva Guzmán Chacón
- Marcela Ovando-Tellez
- Stephanie J. Forkel
Brain Structure and Function (2024)