Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection

Texto completo
Autor(es):
Patil, Ankur T. [1] ; Acharya, Rajul [1] ; Patil, Hemant A. [1] ; Guido, Rodrigo Capobianco [2]
Número total de Autores: 4
Afiliação do(s) autor(es):
[1] Dhirubhai Ambani Inst Informat & Commun Technol I, Speech Res Lab, Gandhinagar 382007 - India
[2] Unesp Univ Estadual Paulista Sao Paulo State Univ, Inst Biociencias Letras & Ciencias Exatas, Rua Cristovao Colombo 2265, BR-15054000 Sao Jose Do Rio Preto, SP - Brazil
Número total de Afiliações: 2
Tipo de documento: Artigo Científico
Fonte: COMPUTER SPEECH AND LANGUAGE; v. 72, MAR 2022.
Citações Web of Science: 0
Resumo

In the scope of voice biometrics, the term replay attack, (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM-FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, dataset, ASVSpoof-2019 challenge dataset, and ASVSpoof-2015 challenge dataset, considering Gaussian Mixture Models (GMMs), Convolutional Neural Networks (CNNs) and Light-CNN architectures as being the classifiers. The standalone ETECC-GMM system showed the best performance by producing equal error rates (EERs) of 5.55% and 10.75% on development and evaluation sets, respectively. (AU)

Processo FAPESP: 19/04475-0 - Análise Paraconsistente de Características dos Sinais de Fala: combatendo os ataques de voice spoofing
Beneficiário:Rodrigo Capobianco Guido
Modalidade de apoio: Auxílio à Pesquisa - Regular