Using neural network to decode the neural signals in the corresponding brain regions when people speak, and then using the recurrent neural network to synthesize the signals into speech, can help the patients with language disorders to solve the problem of language communication.

Mind-reading may really be on the way.

Speaking is a perfectly normal thing for most people. But there are many people around the world who suffer from stroke, traumatic brain injury, neurodegenerative diseases like Parkinson’s disease, multiple sclerosis, and amyotrophic lateral sclerosis (ALS or Lou Gehrig’s disease), often leaving them with an irreversible loss of speech.

Scientists have been working on restoring human function and repairing nerves, and brain-computer interface (BCI) is one of the key areas.

A brain-computer interface is a direct connection created between the human or animal brain and an external device, enabling the exchange of information between the brain and the device.

The term “brain” in BCI refers to the brain or nervous system of an organic life form, not just the brain

But brain-computer interfaces have always seemed a distant concept. And today, the paper Speech Synthesis from Neural Decoding of Spoken Sentences, published in the leading academic journal Nature, It’s a big step forward in the field of brain-computer interface research.

The plight of people with speech disorders

In fact, brain-computer interface research has been going on for more than 40 years. But so far, the most successful and most widely used sensory repair technology is cochlear implant.

To this day, some people with severe speech problems can only use assistive devices to express their thoughts word for word.

These assistive devices can track very subtle eye or facial muscle movements and spell words according to the patient’s movements.

Physicist Stephen Hawking used one of these devices in his wheelchair.

Hawking relies on a speech synthesizer to “speak” and has used a number of assistive communication systems

Hawking relied on muscle movements detected in infrared light to issue commands that identified letters scanned by a computer cursor to write the words he wanted. They then use text-to-speech devices to “speak” the words. It is through these dark techniques that we can read his book a Brief History of Time.

However, producing text or synthesizing speech with such devices is laborious and error-prone, and the synthesis is very slow, usually allowing a maximum of 10 words per minute. Hawking was already very fast, but could only spell 15-20 words. Natural speech can reach 100 to 150 words per minute.

In addition, this method is severely limited by the ability of the operator’s own body to move.

In order to solve these problems, the field of brain-computer interface has been studying how to interpret the corresponding signals in the cortex directly into speech.

Neural networks interpret brain signals to synthesize speech

Now, a breakthrough has been made.

Edward Chang, a professor of neurosurgery at the University of California, San Francisco, and his colleagues, in their paper “Speech Synthesis by Neural Decoding of Spoken Sentences”, propose that their brain-computer interface can decode neural signals generated during human speech and combine them into speech. The system can generate 150 words per minute, which is close to normal human speech.

Lead author Gopala Anumanchipalli holds a set of exemplary intracranial electrodes used to record brain activity in the current study

The team recruited five people undergoing treatment for epilepsy and had them speak hundreds of sentences out loud while recording their high-density electroencephalogram (ECoG) signals and tracking neural activity in the ventral sensory-motor cortex, the brain’s center for speech production.

Using a recurrent neural network (RNN), the researchers decoded the collected neural signals in two steps.

As a first step, they converted neural signals into signals that represented the movements of the vocal organs, including the jaw, throat, lips and tongue.

The second step is to translate the signals into spoken words according to the decoded vocal organ movements.

Step diagram of a brain-computer interface for speech synthesis

To decode the process, the researchers first decoded continuous electrical signals from the surface of three brain regions recorded by invasive electrodes as the patient spoke.

After decoding, 33 kinds of motion characteristics of articulation organs were obtained, and then these motion characteristics were decoded into 32 speech parameters (including pitch, voicing, etc.), and finally voice waves were synthesized according to these parameters.

To analyze how accurately the synthetic speech reproduced real speech, the researchers compared the acoustic characteristics of the original speech to the synthetic speech and found that the speech decoded by the neural network fairly fully reproduced the individual phonemes in the original speech described by the patient, as well as the natural connections and pauses between the phonemes.

The original speech sound wave (top) is compared to the synthetic speech sound wave (bottom)

The researchers then crowdsourced the decoder and asked netizens to identify the synthesized speech. The end result was that the listeners were able to recall the synthesized speech almost 70 percent of the time.

In addition, the researchers tested the decoder’s ability to synthesize speech without speaking. The subject spoke a sentence and then said the same sentence silently (with movement, but without sound). The results show that the speech spectrum synthesized by the decoder for the silent action is similar to that of the same sentence.

Milestones: Challenges and expectations coexist

“This is the first study to show that we can generate complete spoken sentences based on an individual’s brain activity,” Chang said. “This is exciting. This is technology that is already within reach and we should be able to build clinically viable devices for patients with speech loss.”

Dr. Edward Chang’s research focuses on the brain mechanisms of speech, movement, and human emotion

Gopala Anumanchipalli, first author of the paper, added: “I am proud to be able to bring expertise in neuroscience, linguistics and machine learning as part of this important milestone in helping people with neurological disabilities.”

Of course, there are still a number of challenges to achieving a truly 100 percent brain-computer interface for speech synthesis, such as whether patients can undergo invasive surgery to install electrodes, and whether the brain waves in the experiment are the same as those in real patients.

However, from this study, we see that speech synthesis brain-computer interface is no longer a concept.

I look forward to the day when people with speech disorders can regain the ability to “speak” and express their feelings.

Click to read the original article