Advanced search
Start date
Betweenand


Improvements in Brazilian Portuguese Speech Emotion Recognition and its extension to Latin Corpora

Full text
Author(s):
Joshi, Neelakshi ; Paiva, Pedro V. V. ; Batista, Murillo ; Cruz, Marcos V. ; Ramos, Josue J. G. ; IEEE
Total Authors: 6
Document type: Journal article
Source: 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN); v. N/A, p. 8-pg., 2022-01-01.
Abstract

Speech emotion recognition (SER) is challenging, language dependent, and, as with any supervised learning task constrained by data availability. An important aspect in SER modeling is the selection of optimal features to have an unbiased approach towards recognizing emotions while retaining accuracy. Brazilian Portuguese (BP) is a dialect of the 6th most spoken language in the world yet BP SER studies are scarce. This work aims to explore the solely available BP SER database, providing better features to increase the recognition rate for all emotions, and then proposing a simple but robust multi-corporal SER model. For all corpora analyzed in this work, an improvement of up to 9% in mean accuracy is achieved and the obtained recognition rate indicates that the proposed feature sets show comparatively less biased behavior towards all emotions. Also, we prescribe a combination of different metrics to be used with two cepstral features to obtain a higher recognition rate. Our proposed state of the art Latin multi-corporal model contains few features, and achieves outperforming results with a classical machine learning classifier, compared to previously exercised complex features, algorithms and architectures, by yielding the best recognition rate. (AU)

FAPESP's process: 20/07074-3 - Socially interactive robots acting in public environments
Grantee:Josué Junior Guimarães Ramos
Support Opportunities: Regular Research Grants