Main menu

Pages

Neural prosthesis uses brain activity to decode speech

Summary: A newly developed machine learning model can predict the words a person is about to speak based on their neural activity recorded by a minimally invasive neuroprosthetic device.

Source: HSE

Researchers at HSE University and Moscow State University of Medicine and Dentistry have developed a machine learning model that can predict the word about to be uttered by a subject based on their neural activity recorded with a small set of minimally invasive electrodes.

The article ‘Speech decoding of a small set of spatially segregated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network’ was published in Neural Engineering Magazine. The research was funded by a grant from the Russian government as part of the National ‘Science and Universities’ Project.

Millions of people around the world are affected by speech disorders that limit their ability to communicate. Causes of speech loss can vary and include stroke and certain congenital conditions.

Technology is available today to restore communication function in these patients, including ‘silent speech’ interfaces that recognize speech by tracking the movement of articulatory muscles as the person utters words without making any sound. However, these devices help some patients but not others, such as people with facial muscle paralysis.

Speech neuroprostheses – brain-computer interfaces capable of decoding speech based on brain activity – may provide an affordable and reliable solution for restoring communication for these patients.

Unlike personal computers, devices with a brain-computer interface (BCI) are controlled directly by the brain without the need for a keyboard or microphone.

A major barrier to the wider use of BCIs in speech prostheses is that this technology requires highly invasive surgery to implant electrodes into brain tissue.

The most accurate speech recognition is achieved by neuroprostheses with electrodes covering a large area of ​​the cortical surface. However, these solutions for reading brain activity are not intended for long-term use and pose significant risks to patients.

Researchers at the HSE Center for Bioelectric Interfaces and the Moscow State University of Medicine and Dentistry studied the possibility of creating a functional neuroprosthesis capable of decoding speech with acceptable accuracy by reading brain activity from a small set of electrodes implanted in a limited cortical area.

The authors suggest that, in the future, this minimally invasive procedure could even be performed under local anesthesia. In the present study, researchers collected data from two patients with epilepsy who had already been implanted with intracranial electrodes for pre-surgical mapping purposes to locate seizure onset zones.

The first patient was bilaterally implanted with a total of five sEEG rods with six contacts each, and the second patient was implanted with nine electrocorticographic (ECoG) strips with eight contacts each.

Unlike ECoG, sEEG electrodes can be implanted without a complete craniotomy through a hole in the skull. In this study, only the six contacts of a single sEEG axis in one patient and the eight contacts of an ECoG strip in the other were used to decode neural activity.

Subjects were asked to read aloud six sentences, each presented 30 to 60 times in random order. The sentences varied in structure, and most words within a single sentence started with the same letter. The sentences contained a total of 26 different words. As the subjects read, the electrodes recorded their brain activity.

This data was then aligned with the audio signals to form 27 classes, including 26 words and a silence class. The resulting training dataset (containing signals recorded within the first 40 minutes of the experiment) was fed into a machine learning model with a neural network-based architecture.

The learning task for the neural network was to predict the next spoken word (class) based on neural activity data prior to its utterance.

When designing the neural network architecture, the researchers wanted to make it simple, compact, and easily interpretable. They created a two-stage architecture that first extracted representations of inner speech from the recorded brain activity data, producing log-mel spectral coefficients, and then predicted a specific class, i.e. a word or silence.

Thus trained, the neural network achieved 55% accuracy using only six channels of data recorded by a single sEEG electrode on the first patient and 70% accuracy using only eight channels of data recorded by a single ECoG strip on the second patient. Such accuracy is comparable to that demonstrated in other studies with devices that required the implantation of electrodes on the entire cortical surface.

The resulting interpretable model allows explaining in neurophysiological terms which neural information most contributes to predicting a word about to be pronounced.

The researchers examined signals coming from different neuronal populations to determine which ones were essential for the downstream task.

It shows a brain
Millions of people around the world are affected by speech disorders that limit their ability to communicate. Causes of speech loss can vary and include stroke and certain congenital conditions. The image is in the public domain

Their findings were consistent with the speech mapping results, suggesting that the model uses neural signals that are fundamental and can therefore be used to decode imaginary speech.

Another advantage of this solution is that it does not require manual feature engineering. The model learned to extract speech representations directly from the brain activity data.

See too

It shows a father helping his daughter with her homework.

The interpretability of the results also indicates that the network decodes signals from the brain and not from any concomitant activity, such as electrical signals from articulatory muscles or arising from a microphone effect.

The researchers emphasize that the prediction was always based on neural activity data prior to utterance. This, they argue, ensures that the decision rule does not use the auditory cortex’s response to speech already uttered.

“The use of these interfaces involves minimal risks for the patient. If all works out, it may be possible to decode imaginary speech from neural activity recorded by a small number of minimally invasive electrodes implanted in an outpatient setting under local anesthesia,” – Alexey Ossadtchi, lead study author, director of the Center for Bioelectrical Interfaces at the HSE Institute for Cognitive Neuroscience.

About this neurotechnology research news

Author: Ksenia Bregadze
Source: HSE
Contact: Ksenia Bregadze – HSE
Image: The image is in the public domain

Original search: Closed access.
“Speech decoding of a small set of spatially segregated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network” by Alexey Ossadtchi et al. Neural Engineering Magazine


Summary

Speech decoding of a small set of spatially segregated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network

goal. Speech decoding, one of the most intriguing brain-computer interface applications, opens up many opportunities, from patient rehabilitation to direct and continuous communication between human species. Typical solutions rely on invasive recordings with a large number of distributed electrodes implanted through a craniotomy. Here we explore the possibility of creating speech prostheses in a minimally invasive environment with a small number of spatially segregated intracranial electrodes.

Approximation. We collected one hour of data (from two sessions) on two patients implanted with invasive electrodes. We then used only the contacts belonging to a single stereotaxic electroencephalographic (sEEG) axis or an electrocorticographic (ECoG) track to decode neural activity into 26 words and a class of silence. We employ a compact architecture based on a convolutional network whose spatial and temporal filter weights allow a physiologically plausible interpretation.

Main results. We achieved an average accuracy of 55% using only six channels of data recorded with a single minimally invasive sEEG electrode on the first patient and 70% accuracy using only eight channels of data recorded for a single ECoG strip on the second patient in the 26+1 classification. words spoken openly. Our compact architecture did not require the use of pre-engineered features, learned quickly and resulted in a stable, interpretable and physiologically meaningful decision rule, successfully operating on a contiguous dataset collected during a different time interval than that used for training. The spatial characteristics of the central neuronal populations corroborate the results of mapping active and passive speech and exhibit the inverse space-frequency relationship characteristic of neural activity. Compared to other architectures, our compact solution performed equal to or better than those presented recently in the neural speech decoding literature.

Meaning. We show the possibility of building a speech prosthesis with a small number of electrodes and based on a compact feature-engineered free decoder derived from a small amount of training data.

Comments