Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Semantic reconstruction of continuous language from non-invasive brain recordings

Abstract

A brain–computer interface that decodes continuous language from non-invasive recordings would have many scientific and practical applications. Currently, however, non-invasive language decoders can only identify stimuli from among a small set of words or phrases. Here we introduce a non-invasive decoder that reconstructs continuous language from cortical semantic representations recorded using functional magnetic resonance imaging (fMRI). Given novel brain recordings, this decoder generates intelligible word sequences that recover the meaning of perceived speech, imagined speech and even silent videos, demonstrating that a single decoder can be applied to a range of tasks. We tested the decoder across cortex and found that continuous language can be separately decoded from multiple regions. As brain–computer interfaces should respect mental privacy, we tested whether successful decoding requires subject cooperation and found that subject cooperation is required both to train and to apply the decoder. Our findings demonstrate the viability of non-invasive language brain–computer interfaces.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Language decoder.
Fig. 2: Decoding across cortical regions.
Fig. 3: Decoder applications and privacy implications.
Fig. 4: Sources of decoding error.

Data availability

Data collected during the decoder resistance experiment are available upon reasonable request but were not publicly released due to concern that the data could be used to discover ways to bypass subject resistance. All other data are available at https://openneuro.org/datasets/ds003020 and https://openneuro.org/datasets/ds004510.

Code availability

Custom decoding code is available at https://github.com/HuthLab/semantic-decoding.

References

  1. Anumanchipalli, G. K., Chartier, J. & Chang, E. F. Speech synthesis from neural decoding of spoken sentences. Nature 568, 493–498 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Willett, F. R., Avansino, D. T., Hochberg, L. R., Henderson, J. M. & Shenoy, K. V. High-performance brain-to-text communication via handwriting. Nature 593, 249–254 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Moses, D. A. et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N. Engl. J. Med. 385, 217–227 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E. & Gallant, J. L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  6. de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L. & Theunissen, F. E. The hierarchical cortical organization of human speech processing. J. Neurosci. 37, 6539–6557 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Broderick, M. P., Anderson, A. J., Di Liberto, G. M., Crosse, M. J. & Lalor, E. C. Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech. Curr. Biol. 28, 803–809 (2018).

    Article  CAS  PubMed  Google Scholar 

  8. Caucheteux, C. & King, J.-R. Brains and algorithms partially converge in natural language processing. Commun. Biol. 5, 134 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Farwell, L. A. & Donchin, E. Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalogr. Clin. Neurophysiol. 70, 510–523 (1988).

    Article  CAS  PubMed  Google Scholar 

  10. Mitchell, T. M. et al. Predicting human brain activity associated with the meanings of nouns. Science 320, 1191–1195 (2008).

    Article  CAS  PubMed  Google Scholar 

  11. Pereira, F. et al. Toward a universal decoder of linguistic meaning from brain activation. Nat. Commun. 9, 963 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Dash, D., Ferrari, P. & Wang, J. Decoding imagined and spoken phrases from non-invasive neural (MEG) signals. Front. Neurosci. 14, 290 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Logothetis, N. K. The underpinnings of the BOLD functional magnetic resonance imaging signal. J. Neurosci. 23, 3963–3971 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Jain, S. & Huth, A. G. Incorporating context into language encoding models for fMRI. In Advances in Neural Information Processing Systems 31 6629–6638 (NeurIPS, 2018).

  15. Toneva, M. & Wehbe, L. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). In Advances in Neural Information Processing Systems 32 14928–14938 (NeurIPS, 2019).

  16. Schrimpf, M. et al. The neural architecture of language: integrative modeling converges on predictive processing. Proc. Natl Acad. Sci. USA 118, e2105646118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. LeBel, A., Jain, S. & Huth, A. G. Voxelwise encoding models show that cerebellar language representations are highly conceptual. J. Neurosci. 41, 10341–10355 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Naselaris, T., Prenger, R. J., Kay, K. N., Oliver, M. & Gallant, J. L. Bayesian reconstruction of natural images from human brain activity. Neuron 63, 902–915 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Nishimoto, S. et al. Reconstructing visual experiences from brain activity evoked by natural movies. Curr. Biol. 21, 1641–1646 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training. Preprint at OpenAI https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (2018).

  21. Tillmann, C. & Ney, H. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Comput. Linguist. 29, 97–133 (2003).

    Article  Google Scholar 

  22. Lerner, Y., Honey, C. J., Silbert, L. J. & Hasson, U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Binder, J. R. & Desai, R. H. The neurobiology of semantic memory. Trends Cogn. Sci. 15, 527–536 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Deniz, F., Nunez-Elizalde, A. O., Huth, A. G. & Gallant, J. L. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. J. Neurosci. 39, 7722–7736 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Gauthier, J. & Ivanova, A. Does the brain represent words? An evaluation of brain decoding studies of language understanding. In 2018 Conference on Cognitive Computational Neuroscience 1–4 (CCN, 2018).

  26. Fedorenko, E. & Thompson-Schill, S. L. Reworking the language network. Trends Cogn. Sci. 18, 120–126 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Fodor, J. A. The Modularity of Mind (MIT Press, 1983).

  28. Keller, T. A., Carpenter, P. A. & Just, M. A. The neural bases of sentence comprehension: a fMRI examination of syntactic and lexical processing. Cereb. Cortex 11, 223–237 (2001).

    Article  CAS  PubMed  Google Scholar 

  29. Geschwind, N. The organization of language and the brain. Science 170, 940–944 (1970).

    Article  CAS  PubMed  Google Scholar 

  30. Barsalou, L. W. Grounded cognition. Annu. Rev. Psychol. 59, 617–645 (2008).

    Article  PubMed  Google Scholar 

  31. Bunzeck, N., Wuestenberg, T., Lutz, K., Heinze, H.-J. & Jancke, L. Scanning silence: mental imagery of complex sounds. Neuroimage 26, 1119–1127 (2005).

    Article  PubMed  Google Scholar 

  32. Martin, S. et al. Decoding spectrotemporal features of overt and covert speech from the human cortex. Front. Neuroeng. 7, 14 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Naselaris, T., Olman, C. A., Stansbury, D. E., Ugurbil, K. & Gallant, J. L. A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes. Neuroimage 105, 215–228 (2015).

    Article  PubMed  Google Scholar 

  34. Silbert, L. J., Honey, C. J., Simony, E., Poeppel, D. & Hasson, U. Coupled neural systems underlie the production and comprehension of naturalistic narrative speech. Proc. Natl Acad. Sci. USA 111, E4687–E4696 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Fairhall, S. L. & Caramazza, A. Brain regions that represent amodal conceptual knowledge. J. Neurosci. 33, 10552–10558 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Popham, S. F. et al. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nat. Neurosci. 24, 1628–1636 (2021).

    Article  CAS  PubMed  Google Scholar 

  37. Çukur, T., Nishimoto, S., Huth, A. G. & Gallant, J. L. Attention during natural vision warps semantic representation across the human brain. Nat. Neurosci. 16, 763–770 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Kiremitçi, I. et al. Attentional modulation of hierarchical speech representations in a multitalker environment. Cereb. Cortex 31, 4986–5005 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012).

    Article  CAS  PubMed  Google Scholar 

  40. Horikawa, T. & Kamitani, Y. Attention modulates neural representation to render reconstructions according to subjective appearance. Commun. Biol. 5, 34 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Rainey, S., Martin, S., Christen, A., Mégevand, P. & Fourneret, E. Brain recording, mind-reading, and neurotechnology: ethical issues from consumer devices to brain-based speech decoding. Sci. Eng. Ethics 26, 2295–2311 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Kaplan, J. et al. Scaling laws for neural language models. Preprint at arxiv https://doi.org/10.48550/arXiv.2001.08361 (2020).

  43. White, B. R. & Culver, J. P. Quantitative evaluation of high-density diffuse optical tomography: in vivo resolution and mapping performance. J. Biomed. Opt. 15, 026006 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Eggebrecht, A. T. et al. A quantitative spatial comparison of high-density diffuse optical tomography and fMRI cortical mapping. Neuroimage 61, 1120–1128 (2012).

    Article  PubMed  Google Scholar 

  45. Makin, J. G., Moses, D. A. & Chang, E. F. Machine translation of cortical activity to text with an encoder–decoder framework. Nat. Neurosci. 23, 575–582 (2020).

    Article  CAS  PubMed  Google Scholar 

  46. Orsborn, A. L. et al. Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control. Neuron 82, 1380–1393 (2014).

    Article  CAS  PubMed  Google Scholar 

  47. Goering, S. et al. Recommendations for responsible development and application of neurotechnologies. Neuroethics 14, 365–386 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Levy, C. Sintel (Blender Foundation, 2010).

  49. Fedorenko, E., Hsieh, P.-J., Nieto-Castañón, A., Whitfield-Gabrieli, S. & Kanwisher, N. New method for fMRI investigations of language: defining ROIs functionally in individual subjects. J. Neurophysiol. 104, 1177–1194 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Yuan, J. & Liberman, M. Speaker identification on the SCOTUS corpus. J. Acoust. Soc. Am. 123, 3878 (2008).

    Article  Google Scholar 

  51. Boersma, P. & Weenink, D. Praat: doing phonetics by computer (University of Amsterdam, 2014).

  52. Casarosa, E. La Luna (Walt Disney Pictures; Pixar Animation Studios, 2011).

  53. Sweetland, D. Presto (Walt Disney Pictures; Pixar Animation Studios, 2008).

  54. Sohn, P. Partly Cloudy (Walt Disney Pictures; Pixar Animation Studios, 2009).

  55. Jenkinson, M. & Smith, S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 5, 143–156 (2001).

    Article  CAS  PubMed  Google Scholar 

  56. Dale, A. M., Fischl, B. & Sereno, M. I. Cortical surface-based analysis. I. Segmentation and surface reconstruction. Neuroimage 9, 179–194 (1999).

    Article  CAS  PubMed  Google Scholar 

  57. Gao, J. S., Huth, A. G., Lescroart, M. D. & Gallant, J. L. Pycortex: an interactive surface visualizer for fMRI. Front. Neuroinform. 9, 23 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 8024–8035 (NeurIPS, 2019).

  61. Wolf, T. et al. Transformers: state-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 38–45 (Association for Computational Linguistics, 2020).

  62. Holtzman, A., Buys, J., Du, L., Forbes, M. & Choi, Y. The curious case of neural text degeneration. In 8th International Conference on Learning Representations 1–16 (ICLR, 2020).

  63. Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 311–318 (Association for Computational Linguistics, 2002).

  64. Banerjee, S. & Lavie, A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization 65–72 (Association for Computational Linguistics, 2005).

  65. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. & Artzi, Y. BERTScore: evaluating text generation with BERT. In 8th International Conference on Learning Representations 1–43 (ICLR, 2020).

  66. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57, 289–300 (1995).

    Google Scholar 

  67. Faul, F., Erdfelder, E., Lang, A.-G. & Buchner, A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191 (2007).

    Article  PubMed  Google Scholar 

  68. Pennington, J., Socher, R. & Manning, C. D. GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing 1532–1543 (Association for Computational Linguistics, 2014).

  69. Warriner, A. B., Kuperman, V. & Brysbaert, M. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Methods 45, 1191–1207 (2013).

    Article  PubMed  Google Scholar 

  70. Brysbaert, M., Warriner, A. B. & Kuperman, V. Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46, 904–911 (2014).

    Article  PubMed  Google Scholar 

  71. Levy, R. Expectation-based syntactic comprehension. Cognition 106, 1126–1177 (2008).

    Article  PubMed  Google Scholar 

  72. Fischl, B., Sereno, M. I., Tootell, R. B. H. & Dale, A. M. High-resolution intersubject averaging and a coordinate system for the cortical surface. Hum. Brain Mapp. 8, 272–284 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank J. Wang, X. X. Wei and L. Hamilton for comments on the manuscript and A. Arcot for writing answers to the behavioral comprehension questions. This work was supported by the National Institute on Deafness and Other Communication Disorders under award number 1R01DC020088-001 (A.G.H.), the Whitehall Foundation (A.G.H.), the Alfred P. Sloan Foundation (A.G.H.) and the Burroughs Wellcome Fund (A.G.H.).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: J.T. and A.G.H.; Methodology: J.T.; Software and resources: J.T. and S.J.; Investigation and data curation: J.T. and A.L.; Formal analysis and visualization: J.T.; Writing (original draft): J.T.; Writing (review and editing): J.T., A.L., S.J. and A.G.H.; Supervision: A.G.H.

Corresponding author

Correspondence to Alexander G. Huth.

Ethics declarations

Competing interests

A.G.H. and J.T. are inventors on a pending patent application (the applicant is The University of Texas System) that is directly relevant to the language decoding approach used in this work. All other authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Gregory Cogan, Stephen David and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Encoding model and word rate model performance.

The two decoder components that interface with fMRI data are the encoding model and the word rate model. (a) Encoding models were evaluated by predicting brain responses to the perceived speech test story and computing the linear correlation between the predicted responses and the actual single-trial responses. Correlations for subject S3 were projected onto a cortical flatmap. The encoding model successfully predicted brain responses in most cortical regions outside of primary sensory and motor areas. (b) Encoding models were trained on different amounts of data. To summarize encoding model performance across cortex, correlations were averaged across the 10,000 voxels used for decoding. Encoding model performance increased with the amount of training data collected from each subject. (c) Encoding models were tested on brain responses that were averaged across different repeats of the perceived speech test story to artificially increase the signal-to-noise ratio (SNR). Encoding model performance increased with the number of averaged responses. (d) Word rate models were trained on different amounts of data. Word rate models were evaluated by predicting the word rate of a test story and computing the linear correlation between the predicted and the actual word rate vectors. Word rate model performance slightly increased with the amount of training data collected from each subject. (e) For brain responses to perceived speech, word rate models fit on auditory cortex significantly outperformed word rate models fit on frontal speech production areas or randomly sampled voxels (* indicates q(FDR) < 0.05 across n = 3 subjects, two-sided paired t-test). (f) For brain responses to imagined speech, there were no significant differences in performance for word rate models fit on different cortical regions. For all results, black lines indicate the mean across subjects and error bars indicate the standard error of the mean (n = 3).

Extended Data Fig. 2 Perceived and imagined speech identification performance.

Language decoders were trained for subjects S1 and S2 on fMRI responses recorded while the subjects listened to narrative stories. (a) The decoders were evaluated on single-trial fMRI responses recorded while the subjects listened to the perceived speech test story. The color at (i, j) reflects the BERTScore similarity between the ith second of the decoder prediction and the jth second of the actual stimulus. Identification accuracy was significantly higher than expected by chance (P < 0.05, one-sided permutation test). Corresponding results for subject S3 are shown in Fig. 1f in the main text. (b) The decoders were evaluated on single-trial fMRI responses recorded while the subjects imagined telling five 1-minute test stories twice. Decoder predictions were compared to reference transcripts that were separately recorded from the same subjects. Each row corresponds to a scan, and the colors reflect the similarities between the decoder prediction and all five reference transcripts. For each scan, the decoder prediction was most similar to the reference transcript of the correct story (100% identification accuracy). Corresponding results for subject S3 are shown in Fig. 3a in the main text.

Extended Data Fig. 3 Behavioral assessment of decoder predictions.

Four 80 s segments were chosen from the perceived speech test story. For each segment, four multiple-choice questions were written based on the actual stimulus words without looking at the decoder predictions (Supplementary Table 7). 100 subjects were recruited for an online behavioral experiment and randomly assigned to experimental and control groups. For each segment, the experimental group subjects answered the questions after reading the decoded words from subject S3, while the control group subjects answered the questions after reading the actual stimulus words (see Methods). (a) Experimental group scores were significantly higher than expected by chance for 9 out of the 16 questions (* indicates q(FDR) < 0.05, two-sided binomial test). Error bars indicate the bootstrap standard error (n = 1,000 samples). (b) The decoded words and the actual stimulus words for a segment. (c) The multiple-choice questions cover different aspects of the stimulus story.

Extended Data Fig. 4 Decoding across cortical regions.

Cortical regions for subjects S1 and S2. (a) Brain data used for decoding (colored regions) were partitioned into the speech network, the parietal-temporal-occipital association region, and the prefrontal region (PFC). (b) Decoding performance time-course for the perceived speech test story from each region. Horizontal lines indicate when decoder predictions were significantly more similar to the actual stimulus words than expected by chance under the BERTScore metric (q(FDR) < 0.05, one-sided nonparametric test). Corresponding results for subject S3 are shown in Fig. 2a,c in the main text.

Extended Data Fig. 5 Comparison of decoding performance across experiments.

Decoder predictions from different experiments were compared based on the fraction of significantly decoded time-points under the BERTScore metric (q(FDR) < 0.05). The fraction of significantly decoded time-points was used because it does not depend on the length of the stimuli. (a) The decoder successfully recovered 72–82% of time-points during perceived speech, 41–74% of time-points during imagined speech, and 21–45% of time-points during perceived movies. (b) During a multi-speaker stimulus, the decoder successfully recovered 42–68% of time-points told by the female speaker when subjects attended to the female speaker, 0–1% of time-points told by the female speaker when subjects attended to the male speaker, 63–75% of time-points told by the male speaker when subjects attended to the male speaker, and 0–3% of time-points told by the male speaker when subjects attended to the female speaker. (c) During a perceived story, within-subject decoders successfully recovered 65–82% of time-points, volumetric cross-subject decoders successfully recovered 1–2% of time-points, and surface-based cross-subject decoders successfully recovered 1–5% of time-points. (d) During a perceived story, within-subject decoders successfully recovered 52–57% of time-points when subjects passively listened, 4–50% of time-points when subjects resisted by counting by sevens, 0–3% of time-points when subjects resisted by naming animals, and 1–26% of time-points when subjects resisted by imagining a different story.

Extended Data Fig. 6 Cross-subject encoding model and word rate model performance.

For each subject, encoding models and word rate models were trained on anatomically aligned brain responses from 5 sets of other subjects (indicated by markers). The models were evaluated on within-subject single-trial responses to the perceived speech test story. (a) Cross-subject encoding models performed significantly worse than within-subject encoding models (* indicates q(FDR) < 0.05, two-sided t-test). (b) Cross-subject word rate models performed significantly worse than within-subject word rate models (* indicates q(FDR) < 0.05, two-sided t-test).

Extended Data Fig. 7 Decoding performance as a function of training data.

Decoders were trained on different amounts of data and evaluated on the perceived speech test story. (a) The fraction of significantly decoded time-points increased with the amount of training data collected from each subject but plateaued after 7 scanning sessions (7.5 h) and did not substantially increase up to 15 sessions (16 h). The substantial increase up to 7 scanning sessions suggests that decoders can recover certain semantic concepts after training on a small amount of data, but require much more training data to achieve consistently good performance across the test story. (b) The mean identification percentile rank increased with the amount of training data collected from each subject but plateaued after 7 scanning sessions (7.5 h) and did not substantially increase up to 15 sessions (16 h). For all results, black lines indicate the mean across subjects and error bars indicate the standard error of the mean (n = 3).

Extended Data Fig. 8 Decoding performance at lower spatial resolutions.

While fMRI provides high spatial resolution, current MRI scanners are too large and expensive for most practical decoder applications. Portable alternatives like functional near-infrared spectroscopy (fNIRS) measure the same hemodynamic activity as fMRI, albeit at a lower spatial resolution. To simulate how the decoder would perform at lower spatial resolutions, fMRI data were spatially smoothed using Gaussian kernels with standard deviations of 1, 2, 3, 4, and 5 voxels, corresponding to 6.1, 12.2, 18.4, 24.5, and 30.6 mm full width at half maximum (FWHM). The encoding model, noise model, and word rate model were estimated on spatially smoothed training data, and the decoder was evaluated on spatially smoothed responses to the perceived speech test story. (a) fMRI images for each subject were spatially smoothed using progressively larger Gaussian kernels. Blue voxels have above average activity and red voxels have below average activity. (b) Story similarity decreased as the data were spatially smoothed, but remained high at moderate levels of smoothing. (c) The fraction of significantly decoded time-points decreased as the data were spatially smoothed, but remained high at moderate levels of smoothing. (d) Encoding model prediction performance increased as the data were spatially smoothed, demonstrating that decoding performance and encoding model performance are not perfectly coupled. While spatial smoothing reduces information, making it harder to decode the stimulus, it also reduces noise, making it easier to predict the responses. For all results, black lines indicate the mean across subjects and error bars indicate the standard error of the mean (n = 3). Dashed gray lines indicate the estimated spatial resolution of current portable systems43. These results show that around 50% of the stimulus time-points could still be decoded at the estimated spatial resolution of current portable systems, and provide a benchmark for how much portable systems need to improve to reach different levels of decoding performance.

Extended Data Fig. 9 Decoder ablations.

To decode new words, the decoder uses both the autoregressive context (that is the previously decoded words) and the fMRI data. To understand the relative contributions of the autoregressive context and the fMRI data, decoders were evaluated in the absence of each component. The standard decoding approach was performed up to a cutoff point in the perceived speech test story. After the cutoff, either the autoregressive context was reset or the fMRI data were removed. To reset the autoregressive context, all of the candidate sequences were discarded and the beam was re-initialized with an empty sequence. The standard decoding approach was then performed for the remainder of the scan. To remove the fMRI data, continuations were assigned random likelihoods rather than encoding model likelihoods for the remainder of the scan. (a) A cutoff point was defined 300 s into the stimulus for one subject. When the autoregressive context was reset, decoding performance fell but quickly rebounded. When the fMRI data were removed, decoding performance quickly fell to chance level. The gray shaded region indicates the 5th to 95th percentiles of the null distribution. (b) The ablations were repeated for cutoff points at every 50 s of the stimulus. The performance differences between the original decoder and the ablated decoders were averaged across cutoff points and subjects, yielding profiles of how decoding performance changes after each component is ablated. The blue and purple shaded regions indicate the standard error of the mean (n = 27 trials). These results demonstrate that the decoder continually relies on the encoding model and the fMRI data to achieve good performance, and does not require good initial context. In these figures, each time-point was scored based on the 20 s window ending at that time-point, whereas in all other figures, each time-point was scored based on the 20 s window centered around that time-point. This shifted indexing scheme emphasizes how decoding performance changes after a cutoff. Dashed gray lines indicate cutoff points.

Extended Data Fig. 10 Isolated encoding model and language model scores.

The encoding model and the language model were separately evaluated on the perceived speech test story to isolate their contributions to the decoding error (see Methods). At each word time t, the encoding model and the language model were provided with the actual stimulus word and 100 random distractor words. The encoding model ranks the words by the likelihood of the fMRI responses, and the language model ranks the words by the probability given the previous stimulus words. Encoding model and language model scores were computed based on the number of distractor words ranked below the actual word (100 indicates perfect performance, 50 indicates chance level performance). To compare against the decoding scores from Fig. 1e, the word-level scores were averaged across 20 s windows of the stimulus. (a) Encoding model scores were significantly correlated with decoding scores (linear correlation r = 0.22–0.58, P < 0.05), suggesting that many of the poorly decoded time-points in Fig. 1e are inherently more difficult to decode using the encoding model. (b) Language model scores were not significantly correlated with decoding scores. (c) For each word, encoding model scores from 10 sets of distractors were compared to chance level. Most stimulus words with significant encoding model scores (q(FDR) < 0.05, two-sided t-test) for the whole brain also had significant encoding model scores for the speech network (80–87%), association region (88–92%), and prefrontal region (82–85%), suggesting that the results in Fig. 2c were not primarily due to the language model. Word-level encoding model scores were significantly correlated across each pair of regions (q(FDR) < 0.05, two-sided permutation test), suggesting that the results in Fig. 2d were not primarily due to the language model. (d) Word-level encoding model and language model scores were correlated against the word properties tested in Fig. 4d (* indicates q(FDR) < 0.05 for all subjects, two-sided permutation test). The encoding model and the language model were biased in opposite directions for several word properties. These effects may have balanced out in the full decoder, leading to the observed lack of correlation between the word properties and decoding scores (Fig. 4d).

Supplementary information

Supplementary Information

Supplementary Fig. 1 and Tables 1–7.

Reporting Summary

Supplementary Video

Single-trial BOLD fMRI responses were recorded and decoded while subject S3 watched a self-contained clip from the short film ‘Sintel’ without sound (Blender Foundation; https://www.sintel.org (ref. 48)). Each frame is shown at the time it was presented to the subject, and each decoded word is shown at its predicted time.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, J., LeBel, A., Jain, S. et al. Semantic reconstruction of continuous language from non-invasive brain recordings. Nat Neurosci 26, 858–866 (2023). https://doi.org/10.1038/s41593-023-01304-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-023-01304-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing