Advanced search
Start date

Classification of feelings by voice in customer service in real time

Grant number: 20/05820-0
Support type:Research Grants - Innovative Research in Small Business - PIPE
Duration: February 01, 2022 - October 31, 2022
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal researcher:Gabriele Bellini
Grantee:Gabriele Bellini
Company:Tectra Soluções Integradas em Comunicação Ltda
CNAE: Desenvolvimento e licenciamento de programas de computador não-customizáveis
Tratamento de dados, provedores de serviços de aplicação e serviços de hospedagem na internet
City: São Paulo
Pesquisadores principais:
Erikson Júlio de Aguiar


This project consists of a platform for analysis of phone calls, capable of identifying emotions in real time. Such tool is unprecedented in the Portuguese-speaking market and still incipient in the English language. In this sense, the product is intended for call center services or companies with a high volume of voice communication channels, such as banks, insurance companies, telecommunications companies, among others. Despite the major changes brought by the digital era to the way companies interact with their customers, voice-based contact remains an important means of communication between the two parties. Phone service, for example, is a more intimate way that companies have to interact with their customers, so that a cohesive relationship between them can be created. In particular, the call center market currently employs more than 1.4 million people in Brazil, with a revenue of R$ 51.8 billion, in 2018.In 2019, the sector's growth was estimated at 5.6% regarding the previous year. Currently, in order to assess the level of satisfaction of their customers, companies need to assign a team to listen to the recordings of phone calls, analyzing a sample of them. However, this method can be inefficient or even impracticable, depending on the number of recordings. To assist this process, Neomove developed the K.A.R.L.A. platform (Knowledgeable Audio Recognition Learning Algorithm), which consists of a tool capable of transcribing and classifying phone call recordings. K.A.R.L.A. is based on the semantic analysis of audio transcriptions, with a classification model of three classes of emotions: happiness, unhappiness and neutrality. In this way, it is possible to identify the calls in which the customer shows dissatisfaction with the product or service, thus providing a pre-selection of the recordings that need to be analyzed in more details. Moreover, there is still a need for a tool capable of carrying out analysis of the recordings automatically and in real time, providing relevant information about the content of the audio. Therefore, the tool proposed here aims to identify the level of satisfaction of the user in a customer service system through the recognition of voice patterns. Studies on emotional speech confirm that there is a close correlation between speech and emotion. Voice signals in human speech are a quick and easy way to understand communication, which are considered of great importance in a system of Speech Emotion Recognition (SER). In addition to the syntactic and semantic evidence that speech conveys, human emotional and physical states can be recognized from the processing of the voice signal. SER systems are capable of transforming data from speech signals into information related to the feelings of individuals in particular situations, for example, the reactions of customers to telemarketing services. Thus, it is possible to make use of speech patterns for the automatic recognition of the emotional state of human beings. For this purpose, the characteristics Mel-frequency cepstral coefficients (MFCCs) will be extracted from the audios, which are the most common characteristics in voice analysis applications. These characteristics are based on the scepter and inspired by the way the human ear responds to sound stimuli, since the spectrum frequencies are on a Mel scale, non-linear, with gradation that seeks to imitate the hearing perception in humans. In addition, prosody resources will also be used to integrate the proposed model, since such resources comprise the sound, the jitter and the pitch of the voice in human speech. Finally, by integrating the analysis of emotions based on the frequency of the voice with the existing K.A.R.L.A. system, this project aims to improve the process of classification of emotions in the phone calls of customer services, thus obtaining consistent and on-line information about the quality of the service provided by the company. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items