Advanced search
Start date
Betweenand


Multi-Script Video Caption Localization Based on Visual Rhythms

Full text
Author(s):
Roberto e Souza, Marcos ; Maia, Helena de Almeida ; Souza e Santos, Anderson Carlos ; Vieira, Marcelo Bernardes ; Pedrini, Helio
Total Authors: 5
Document type: Journal article
Source: APPLIED ARTIFICIAL INTELLIGENCE; v. 36, n. 1, p. 32-pg., 2022-02-05.
Abstract

Localization of video caption plays an important role in information retrieval in multimedia applications. In this work, we present and evaluate a novel method for localizing video captions using visual rhythms, which enable the representation and analysis of a specific feature throughout the time. We build visual rhythms from the text location maps produced by general text localization methods that are far more common in the literature than caption-oriented ones. Then, we process the maps properly to keep only the captions, generating caption localization masks. To meet the need for a standardized and large dataset, we constructed a new one, where captions with thirteen different scripts are added to the video frames, generating a total of 221 videos with ground truth. Experiments demonstrate that our method achieves competitive results when compared to other literature approaches. (AU)

FAPESP's process: 17/12646-3 - Déjà vu: feature-space-time coherence from heterogeneous data for media integrity analytics and interpretation of events
Grantee:Anderson de Rezende Rocha
Support Opportunities: Research Projects - Thematic Grants