Machine translation of cortical activity to text with an encoder–decoder framework

Makin, Joseph G.; Moses, David A.; Chang, Edward F.

doi:10.1038/s41593-020-0608-8

Technical Report
Published: 30 March 2020

Machine translation of cortical activity to text with an encoder–decoder framework

Nature Neuroscience volume 23, pages 575–582 (2020)Cite this article

28k Accesses
155 Citations
1358 Altmetric
Metrics details

Subjects

Abstract

A decade after speech was first decoded from human brain signals, accuracy and speed remain far below that of natural speech. Here we show how to decode the electrocorticogram with high accuracy and at natural-speech rates. Taking a cue from recent advances in machine translation, we train a recurrent neural network to encode each sentence-length sequence of neural activity into an abstract representation, and then to decode this representation, word by word, into an English sentence. For each participant, data consist of several spoken repeats of a set of 30–50 sentences, along with the contemporaneous signals from ~250 electrodes distributed over peri-Sylvian cortices. Average word error rates across a held-out repeat set are as low as 3%. Finally, we show how decoding with limited data can be improved with transfer learning, by training certain layers of the network under multiple participants’ data.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: WERs of the decoded sentences.**

**Fig. 3: WER of the decoded MOCHA-1 sentences for encoder–decoder models trained with transfer learning.**

**Fig. 4: The contributions of each anatomical area to decoding, as measured by the gradient of the loss function with respect to the input data (see “Anatomical contributions” for details).**

**Fig. 5: Electrode coverage and contributions.**

**Fig. 6: Graphical model for the decoding process.**

Brains and algorithms partially converge in natural language processing

Article Open access 16 February 2022

Charlotte Caucheteux & Jean-Rémi King

Speech synthesis from neural decoding of spoken sentences

Article 24 April 2019

Gopala K. Anumanchipalli, Josh Chartier & Edward F. Chang

Dissecting neural computations in the human auditory pathway using deep neural networks for speech

Article Open access 30 October 2023

Yuanning Li, Gopala K. Anumanchipalli, … Edward F. Chang

Data availability

Deidentified copies of the data used in this study will be provided upon reasonable request. Please contact E.F.C. via e-mail with any inquiries. Source data for the figures are likewise available upon request; please contact J.G.M. via e-mail with inquiries.

Code availability

The code used to train and test the encoder–decoders is available at https://github.com/jgmakin/machine_learning. Code used to assemble data and generate figures is also available upon reasonable request; please contact J.G.M. via e-mail with any inquiries.

References

Nuyujukian, P. et al. Cortical control of a tablet computer by people with paralysis. PLoS ONE 13, 1–16 (2018).
Article Google Scholar
Gilja, V. et al. Clinical translation of a high-performance neural prosthesis. Nat. Med. 21, 1142–1145 (2015).
Article CAS Google Scholar
Jarosiewicz, B. et al. Virtual typing by people with tetraplegia using a self-calibrating intracortical brain–computer interface. Sci. Transl. Med. 7, 1–19 (2015).
Article Google Scholar
Brumberg, J.S. Kennedy, P.R. & Guenther, F.H. Artificial speech synthesizer control by brain–computer interface. In Interspeech, 636–639 (International Speech Communication Association, 2009).
Brumberg, J. S., Wright, E. J., Andreasen, D. S., Guenther, F. H. & Kennedy, P. R. Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex. Front. Neuroeng. 5, 1–12 (2011).
Google Scholar
Pei, X., Barbour, D. L. & Leuthardt, E. C. Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans. J. Neural Eng. 8, 1–11 (2011).
Article Google Scholar
Mugler, E. M. et al. Differential representation of articulatory gestures and phonemes in precentral and inferior frontal gyri. J. Neurosci. 4653, 1206–18 (2018).
Google Scholar
Stavisky, S.D. et al. Decoding speech from intracortical multielectrode arrays in dorsal ‘arm/hand areas’ of human motor cortex. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS (ed. Patton, J.) 93–97 (IEEE, 2018).
Herff, C. et al. Brain-to-text: decoding spoken phrases from phone representations in the brain. Front. Neurosci. 9, 1–11 (2015).
Article Google Scholar
Sutskever, I., Vinyals, O. & Le, Q.V. Sequence to sequence learning with neural networks. Adv. Neural Inform. Process. Syst. 27, 3104–3112 (2014).
Google Scholar
Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Moschitti, A., Pang, B. & Daelemans, W.) 1724–1734 (Association for Computational Linguistics, 2014).
Koehn, P. Europarl: a parallel corpus for statistical machine translation. In Machine Translation Summit X, 79–86 (Asia-Pacific Association for Machine Translation, 2005).
Beelen, K. et al. Digitization of the Canadian parliamentary debates. Can. J. Polit. Sci. 50, 849–864 (2017).
Article Google Scholar
Wrench, A.A. A multichannel articulatory database and its application for automatic speech recognition. In Proceedings of the 5th Seminar of Speech Production (ed. Hoole, P.) 305–308 (Institut für Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universität, 2000).
Dichter, B. K., Breshears, J. D., Leonard, M. K. & Chang, E. F. The control of vocal pitch in human laryngeal motor cortex. Cell 174, 21–31.e9 (2018).
Article CAS Google Scholar
Bouchard, K. E., Mesgarani, N., Johnson, K. & Chang, E. F. Functional organization of human sensorimotor cortex for speech articulation. Nature 495, 327–332 (2013).
Article CAS Google Scholar
Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).
Article Google Scholar
Szegedy, C. et al. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9 (IEEE, 2015).
Rumelhart, D., Hinton, G. E. & Williams, R. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Article Google Scholar
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Google Scholar
Xiong, W. et al. Toward human parity in conversational speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 2410–2423 (2017).
Article Google Scholar
Munteanu, C. Penn, G. Baecker, R. Toms, E. & James, D. Measuring the acceptable word error rate of machine-generated webcast transcripts. In Interspeech, 157–160 (ISCA, 2006).
Schalkwyk, J. et al. in Advances in Speech Recognition (ed. Neustein, A.) 61–90 (Springer, 2010).
Moses, D. A., Leonard, M. K., Makin, J. G. & Chang, E. F. Real-time decoding of question-and-answer speech dialogue using human cortical activity. Nat. Commun. 10, 3096 (2019).
Article Google Scholar
Cho, K. van Merrienboer, B. Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (eds Wu, D., Carpuat, M., Carreras, X. & Vecchi, E. M.) 103–111 (Association for Computational Linguistics, 2014).
Pratt, L., Mostow, J. & Kamm, C. Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence Vol. 2, 584–589 (AAAI Press, 1991).
Simonyan, K. Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Workshop at the International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) 1–8 (ICLR, 2014).
Burke, J. F. et al. Synchronous and asynchronous theta and gamma activity during episodic memory formation. J. Neurosci. 33, 292–304 (2013).
Article CAS Google Scholar
Meisler, S. L., Kahana, M. J. & Ezzyat, Y. Does data cleaning improve brain state classification? J. Neurosci. Methods 328, 1–10 (2019).
Article Google Scholar
Conant, D. F., Bouchard, K. E., Leonard, M. K. & Chang, E. F. Human sensorimotor cortex control of directly measured vocal tract movements during vowel production. J. Neurosci. 38, 2955–2966 (2018).
Article CAS Google Scholar
Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).
Article CAS Google Scholar
Yi, H. G., Leonard, M. K. & Chang, E. F. The encoding of speech sounds in the superior temporal gyrus. Neuron 102, 1096–1110 (2019).
Article CAS Google Scholar
Chang, E. F., Niziolek, C. A., Knight, R. T., Nagarajan, S. S. & Houde, J. F. Human cortical sensorimotor network underlying feedback control of vocal pitch. Proc. Natl Acad. Sci. USA 110, 2653–2658 (2013).
Article CAS Google Scholar
Bahdanau, D. Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) 1–15 (ICLR, 2015).
Bai, S. Kolter, J.Z. & Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. Preprint at arXiv https://arxiv.org/pdf/1803.01271.pdf (2018).
Tian, X. & Poeppel, D. Mental imagery of speech and movement implicates the dynamics of internal forward models. Front. Psychol. 1, 1–23 (2010).
Google Scholar
Lyons, J. et al. Python Speech Features v.0.6.1 https://doi.org/10.5281/zenodo.3607820 (Zenodo, 2020).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article CAS Google Scholar
Gers, F. A., Schmidhuber, J. & Cummins, F. Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (2000).
Article CAS Google Scholar
Abadi, M. et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems. in Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016).
Kingma, D.P. & Ba, J. Adam: a method for stochastic optimization. in 3rd International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015).
Zaremba, W., Sutskever, I. & Vinyals, O. Recurrent neural network regularization. Preprint at arXiv http://arxiv.org/abs/1409.2329 (2015).

Download references

Acknowledgements

The project was funded by a research contract under Facebook’s Sponsored Academic Research Agreement. Data were collected and preprocessed by members of the Chang laboratory, some (MOCHA-TIMIT) under NIH grant no. U01 NS098971. Some neural networks were trained using GPUs generously donated by the Nvidia Corporation. We thank M. Leonard, B. Dichter and P. Hullett for comments on a draft of the manuscript and thank J. Burke for suggesting bipolar referencing.

Author information

Authors and Affiliations

Center for Integrative Neuroscience, UCSF, San Francisco, CA, USA
Joseph G. Makin, David A. Moses & Edward F. Chang
Department of Neurological Surgery, UCSF, San Francisco, CA, USA
Joseph G. Makin, David A. Moses & Edward F. Chang

Authors

Joseph G. Makin
View author publications
You can also search for this author in PubMed Google Scholar
David A. Moses
View author publications
You can also search for this author in PubMed Google Scholar
Edward F. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.G.M. conceived and implemented the decoder and all analyses thereof, except the comparison to the phoneme-based decoder, which was conceived and implemented by D.A.M. E.F.C. led the research project. J.G.M. wrote the manuscript with input from all authors.

Corresponding authors

Correspondence to Joseph G. Makin or Edward F. Chang.

Ethics declarations

Competing interests

This work was funded in part by Facebook Reality Labs. UCSF holds patents related to speech decoding.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Fig. 1 and Supplementary Tables 1–6.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Makin, J.G., Moses, D.A. & Chang, E.F. Machine translation of cortical activity to text with an encoder–decoder framework. Nat Neurosci 23, 575–582 (2020). https://doi.org/10.1038/s41593-020-0608-8

Download citation

Received: 23 August 2019
Accepted: 10 February 2020
Published: 30 March 2020
Issue Date: April 2020
DOI: https://doi.org/10.1038/s41593-020-0608-8

This article is cited by

A neural speech decoding framework leveraging deep learning and speech synthesis
- Xupeng Chen
- Ran Wang
- Adeen Flinker
Nature Machine Intelligence (2024)
The decoder design and performance comparative analysis for closed-loop brain–machine interface system
- Hongguang Pan
- Yunpeng Fu
- Xuebin Qin
Cognitive Neurodynamics (2024)
Boosting brain–computer interfaces with functional electrical stimulation: potential applications in people with locked-in syndrome
- Evan Canny
- Mariska J. Vansteensel
- Julia Berezutskaya
Journal of NeuroEngineering and Rehabilitation (2023)
Dynamical flexible inference of nonlinear latent factors and structures in neural population activity
- Hamidreza Abbaspourazad
- Eray Erturk
- Maryam M. Shanechi
Nature Biomedical Engineering (2023)
High-resolution neural recordings improve the accuracy of speech decoding
- Suseendrakumar Duraivel
- Shervin Rahimpour
- Gregory B. Cogan
Nature Communications (2023)