Advantages and Pitfalls of Dataset Condensation: An Approach to Keyword Spotting with Time-Frequency Representations

Pereira, Pedro Henrique; Beccaro, Wesley; Ramirez, Miguel Arjona

Full text
Author(s):	Pereira, Pedro Henrique ; Beccaro, Wesley ; Ramirez, Miguel Arjona Total Authors: 3
Document type:	Journal article
Source:	ELECTRONICS; v. 13, n. 11, p. 13-pg., 2024-06-01.
Abstract
With the exponential growth of data, the need for efficient techniques to extract relevant information from datasets becomes increasingly imperative. Reducing the training data can be useful for applications wherein storage space or computational resources are limited. In this work, we explore the concept of data condensation (DC) in the context of keyword spotting systems (KWS). Using deep learning architectures and time-frequency speech representations, we have obtained condensed speech signal representations using gradient matching with Efficient Synthetic-Data Parameterization. From a series of classification experiments, we analyze the models and condensed data performances in terms of accuracy and number of data per class. We also present results using cross-model techniques, wherein models are trained with condensed data obtained from a different architecture. Our findings demonstrate the potential of data condensation in the context of the speech domain for reducing the size of datasets while retaining their most important information and maintaining high accuracy for the model trained with the condensed dataset. We have obtained an accuracy of 80.75% with 30 condensed speech representations per class with ConvNet, representing an addition of 24.9% in absolute terms when compared to 30 random samples from the original training dataset. However, we demonstrate the limitations of this approach in the cross-model tests. We also highlight the challenges and opportunities for further improving the accuracy of condensed data obtained and trained with different neural network architectures. (AU)

FAPESP's process:	22/10909-5 - Differentiable learning and processing of temporal, spatial, spectral and time-frequency signal representations
Grantee:	Miguel Arjona Ramírez
Support Opportunities:	Regular Research Grants

Short URL