Although smartphones and social media are becoming ever more integral to our lives, verbal communication remains the most common way people interact. However, some people are born without, or may suddenly lose, the ability to talk because of physical disability or disease.
Now advances in technologies that convert brain activity into voice or text are paving the way for communication via brain signals that could provide a voice-free and hands-free means of communication, and hugely improve the lives of people with disabilities.
Since the 1990s neurologists and computer scientists have been using AI to bridge the gap between a person’s thoughts and the actions they wish to take. Brain-computer interfaces (BCI) use electrical signals generated when a person imagines something, such as moving their arm, to control an external device, such as a robotic limb.
Brain signals can be measured non-invasively using electroencephalography (EEG), a method that uses electrodes placed on the scalp.
Curiosity about the brain
“As a young student I was always curious about how the brain works,” says Seo-Hyun Lee, a neural engineer from the Department of Brain and Cognitive Engineering at Korea University, in Seoul, South Korea. “But as a PhD student, I want to do more than just uncover its mysteries. By identifying the brain activity triggered by thinking certain words, I hope we can create a technology that will help people who cannot speak, or have lost their ability to speak.”
Converting brain signals to natural speech is challenging as EEG data is extremely noisy, therefore, developing advanced AI that can pick out key features from the data is very important.
There has been some success in generating speech from signals captured by surgically implanted electrodes, or while patients spoke out loud. However, to enable ‘silent conversation’ for broader applications in our lives, such as generating text on a computer without typing on a keyboard, much simpler and less invasive techniques will be crucial. “We are now extremely interested in generating voices without the need to implant electrodes in the brain, and only from imagined speech,” says Seong-Whan Lee, professor of artificial intelligence and brain engineering at Korea University.
‘Imagined speech’ is when someone imagines speaking without making a sound (‘spoken speech’) or miming the words (‘mimed speech’).
Lee’s lab, supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grants*, specializes in pattern recognition and machine learning, with a focus on patterns of brain signals related to speech. “The main aim of our research is using these patterns to analyse what a person is thinking and predict what they want to say,” says Lee. “Current BCI technology is largely text and audio-based, but in the future we hope to use brain signals from imagined speech, as these can be very direct and intuitive.”
A thoughtful approach
Brain-to-speech technology combines several key areas of AI, including: BCI; deep-learning tools that capture significant features from complex brain signals; and speech synthesis technology. In such a system, a person imagines saying ‘How are you?’, the EEG records the signals created by this thought, the deep-learning model decodes the features of the message from brain signals, and finally, synthesize user’s voice using the extracted features.
Seong-Whan Lee, Seo-Hyun Lee and the team at Korea University have developed a brain-to-speech technology that can recognize and generate 12 words from imagined speech signals recorded with non-invasive measure, EEG. This represents a significant step towards brain-to-speech applications, since previous studies have only involved classifying several words, and the team has had to overcome significant challenges.
“Recording electrical signals through the scalp makes it trickier to pick out the speech signals as hair and skin introduce a lot of artefacts, so we had to develop a method of removing them” says Seo-Hyun Lee. “On the plus side, imagined speech generates fewer artefacts than mimed and spoken speech, because there is no movement involved,” she adds.
Another issue is that, unlike spoken speech, imagined speech has no vocal record the AI can check against, nor the audio needed to train the speech synthesiser. “We need to identify the exact onset of imagined speech in the brain signals, and match these signals with the corresponding user’s voice,” says Seo-Hyun Lee.
To uncover these unique signals, Seong-Whan Lee’s team collected a large database of imagined and spoken speech signals using 12 words that are commonly used in patient communication, such as ‘help me’ and ‘thank you’. Participants were rigged up to the scalp electrode cap, and performed three sessions, repeatedly saying a word out loud, repeatedly imagine saying it, and repeatedly imagine seeing it. The team then used an AI model to look for patterns in the recorded EEG signals, and learn what words and sounds to associate them with.
“We observed common features between the two types of speech, such as timing, location and intensity of the electrical signals, as well as similar spatial patterns, with both types of speech lighting up similar areas of the left temporal lobe and frontal cortex,” says Seo-Hyun Lee.
“We further investigated the intrinsic features of imagined speech by comparing the EEG results with visual imagery signals, whereby participants simply imagine a picture of the word, such as a clock,” she adds. “Interestingly, they also showed a significant correlation with each other, with clusters of words creating similar signals.” 1
Using the optimal features, their model was able to distinguish subtle differences between the signals and successfully learned to recognize the 12 words from imagined speech alone. Moreover, their AI model could successfully translate an EEG of imagined speech into synthesized speech with the user’s own voice at the word-level.
The team is working on a virtual prototype to highlight the potential of combining AI with information and communications technology, big data, and robotics in smart homes, so people with disabilities will be able to control their heating, lighting, entertainment and appliances simply by thinking.
“We are still some way off synthesising natural sounding full sentences beyond word-level from imagined speech, but we are always finding potential ways to improve our technique,” says Seo-Hyun Lee. She hopes that they can reach this goal in the next decade so they can begin to have a positive impact on patients’ lives.
“People with disabilities or illnesses such as amyotrophic lateral sclerosis (ALS) may eventually lose their ability to speak,” says Seo-Hyun Lee. “Brain-to-speech technology can be used to record them talking while they are still able, so they can speak out in their own voices again one day.”