Study, Analysis and Recognition of Dysarthric Speech
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Dysarthria is a term derived from two parts: ‘dys’ which signifies having difficulties, and ’arthr’ refer to articulation, is a neurological speech disorder that majorly happens due to cerebral strokes or significant traumatic incidents. It is characterized by a spectrum of speech impairments, including but not limited to unintelligible speech, inconsistent speech pace, atypical speech prosody, slurred speech, poor voice quality and imprecise articulation. As the severity of the condition increases, the coordination between the movements of lips and tongue deteriorates, resulting in highly unintelligible speech. Compared to healthy speech, dysarthric speech is much more challenging to recognize due to inconsistencies in the acoustic signal and limited data availability. This study introduces a new approach that is based on the combination of recognition, characterization, synthesis and human assessment of dysarthric speech. The goal is to enhance the performance of automatic speech recognition (ASR) systems for this class of people. Additionally, this approach aims to support dysarthric speech assessment process, ensuring that proper treatment is provided with less intervention. We aim to bridge the gap between dysarthric speakers and their interactions with machines, reducing complexity and ultimately improving their quality of life. This study primarily focuses on a speaker-adaptive approach, as the characteristics and behavior of each speaker vary significantly depending on the severity of their condition. For over a decade, researchers have been trying to improve the ASR system and rehabilitation for dysarthric speech, but they still lag behind and still, there is a lot of scope for improvement. Early efforts focused on HMM-based hand-crafted features, which could not handle the variability of dysarthric speech. With the development of new datasets, attention shifted to neural networks and deep learning techniques. This study explores different algorithms and methods that could help them lead better lives. We investigate the use of the Affinity Propagation (AP) algorithm for dysarthric speech segments to select the most informative feature set, which captures key information about the speaker. As we switch to larger datasets, we explore the LSTM-RNN architecture with a fusion of multiple audio descriptors. This additional information is provided to the classifier to enhance its ability to accurately identify each word and we achieved 83.11% overall accuracy. However, data scarcity is a persistent challenge in the field of dysarthria. Given the limitations of available datasets, it is difficult for researchers to develop a robust ASR system. To address this, we utilized the existing UASpeech dataset and expanded it by generating new words that retain the same characteristics and behaviors of the speakers. This contribution enables the expansion of the dataset in both breadth and depth. This thesis investigates the detection and severity classification of dysarthric speech using the Audio Spectrogram Transformer, which is an essential component for the entire treatment and recovery. Our model achieved an accuracy of 99.64%, surpassing the state-of-the-art results. We also explore dysarthric speech at the phoneme level to pinpoint the specific areas where speakers face difficulties. We identified the phoneme sets in which they most frequently misarticulate using the Goodness of Pronunciation (GOP) algorithm. These insights will assist both speakers and speech pathologists in facilitating early recovery. To ensure the usability of the UASpeech database, we annotated every audio file and obtained the ground truth information for the dataset. This annotation process allowed us to accurately capture the essential details of the data, making it suitable for further research and analysis. Based on various experimental results, it is clear that the proposed method and findings significantly enhance the quality of life for dysarthric individuals.
Description
Supervisor: Das, Pradip K
Keywords
Citation
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as https://creativecommons.org/licenses/by-nc-sa/4.0/

