Advanced search
Start date
Betweenand
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

Residual Neural Network precisely quantifies dysarthria severity-level based on short-duration speech segments

Full text
Author(s):
Gupta, Siddhant [1] ; Patil, Ankur T. [1] ; Purohit, Mirali [1] ; Parmar, Mihir [2] ; Patel, Maitreya [1] ; Patil, Hemant A. [1] ; Guido, Rodrigo Capobianco [3]
Total Authors: 7
Affiliation:
[1] Dhirubhai Ambani Inst Informat & Commun Technol I, Speech Res Lab, Gandhinagar 382007 - India
[2] Arizona State Univ, Tempe, AZ - USA
[3] Sao Paulo State Univ, Unesp Univ Estadual Paulista, Inst Biociencias Letras & Ciencias Exatas, Rua Cristovao Colombo 2265, BR-15054000 Sao Jose Do Rio Preto, SP - Brazil
Total Affiliations: 3
Document type: Journal article
Source: NEURAL NETWORKS; v. 139, p. 105-117, JUL 2021.
Web of Science Citations: 0
Abstract

Recently, we have witnessed Deep Learning methodologies gaining significant attention for severitybased classification of dysarthric speech. Detecting dysarthria, quantifying its severity, are of paramount importance in various real-life applications, such as the assessment of patients' progression in treatments, which includes an adequate planning of their therapy and the improvement of speech-based interactive systems in order to handle pathologically-affected voices automatically. Notably, current speech-powered tools often deal with short-duration speech segments and, consequently, are less efficient in dealing with impaired speech, even by using Convolutional Neural Networks (CNNs). Thus, detecting dysarthria severity-level based on short speech segments might help in improving the performance and applicability of those systems. To achieve this goal, we propose a novel Residual Network (ResNet)-based technique which receives short-duration speech segments as input. Statistically meaningful objective analysis of our experiments, reported over standard Universal Access corpus, exhibits average values of 21.35% and 22.48% improvement, compared to the baseline CNN, in terms of classification accuracy and F1-score, respectively. For additional comparisons, tests with Gaussian Mixture Models and Light CNNs were also performed. Overall, the values of 98.90% and 98.00% for classification accuracy and F1-score, respectively, were obtained with the proposed ResNet approach, confirming its efficacy and reassuring its practical applicability. (C) 2021 Elsevier Ltd. All rights reserved. (AU)

FAPESP's process: 19/04475-0 - Paraconsistent Feature Analysis of Speech Signals: fighting the voice spoofing attacks
Grantee:Rodrigo Capobianco Guido
Support Opportunities: Regular Research Grants