Advanced search
Start date
Betweenand


An End-to-End Deep Learning Approach for Video Captioning Through Mobile Devices

Full text
Author(s):
Pezzuto Damaceno, Rafael J. ; Cesar, Roberto M., Jr.
Total Authors: 2
Document type: Journal article
Source: PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I; v. 14469, p. 15-pg., 2024-01-01.
Abstract

Video captioning is a computer vision task that aims at generating a description for video content. This can be achieved using deep learning approaches that leverage image and audio data. In this work, we have developed two strategies to tackle this task in the context of resource-constrained devices: (i) generating one caption per frame combined with audio classification, and (ii) generating one caption for a set of frames combined with audio classification. In these strategies, we have utilized one architecture for the image data and another for the audio data. We have developed an application tailored for resource-constrained devices, where the image sensor captures images at a specific frame rate. The audio data is captured from a microphone for a predefined duration at time. Our application combines the results from both modalities to create a comprehensive description. The main contribution of this work is the introduction of a new end-to-end application that can utilize the developed strategies and be beneficial for environment monitoring. Our method has been implemented on a low-resource computer, which poses a significant challenge. (AU)

FAPESP's process: 15/22308-2 - Intermediate representations in Computational Science for knowledge discovery
Grantee:Roberto Marcondes Cesar Junior
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 22/15304-4 - Learning context rich representations for computer vision
Grantee:Nina Sumiko Tomita Hirata
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 22/12204-9 - Development of methods for image captioning: a framework based on computer vision and natural language processing
Grantee:Rafael Jeferson Pezzuto Damaceno
Support Opportunities: Scholarships in Brazil - Post-Doctoral