Posted by Katrin Tomanek, Software Engineer and Bob MacDonald, Technical Program Manager, Google Research
Speech impairments affect millions of people, with underlying causes ranging from neurological or genetic conditions to physical impairment, brain damage or hearing loss. Similarly, the resulting speech patterns are diverse, including stuttering, dysarthria, apraxia, etc., and can have a detrimental impact on self-expression, participation in society and access to voice-enabled technologies. Automatic speech recognition (ASR) technologies have the potential to help individuals with such speech impairments by improving access to dictation and home automation and by enhancing communication. However, while the increased computational power of deep learning systems and the availability of large training datasets has improved the accuracy of ASR systems, their performance is still insufficient for many people with speech disorders, rendering the technology unusable for many of the speakers who could benefit the most.
In 2019, we introduced Project Euphonia and discussed how we could use personalized ASR models of disordered speech to achieve accuracies on par with non-personalized ASR on typical speech. Today we share the results of two studies, presented at Interspeech 2021, that aim to expand the availability of personalized ASR models to more users. In “Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia”, we present a greatly expanded collection of disordered speech data, composed of over 1 million utterances. Then, in “Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases”, we discuss our efforts to generate personalized ASR models based on this corpus. This approach leads to highly accurate models that can achieve up to 85% improvement to the word error rate in select domains compared to out-of-the-box speech models trained on typical speech.
Impaired Speech Data Collection
Since 2019, speakers with speech impairments of varying degrees of severity across a variety of conditions have provided voice samples to support Project Euphonia’s research
This article is purposely trimmed, please visit the source to read the full article.
The post Personalized ASR Models from a Large and Diverse Disordered Speech Dataset appeared first on Google AI Blog.