Multi-Script Video Caption Localization Based on Visual Rhythms

Full text
Author(s):	Roberto e Souza, Marcos ; Maia, Helena de Almeida ; Souza e Santos, Anderson Carlos ; Vieira, Marcelo Bernardes ; Pedrini, Helio Total Authors: 5
Document type:	Journal article
Source:	APPLIED ARTIFICIAL INTELLIGENCE; v. 36, n. 1, p. 32-pg., 2022-02-05.
Abstract
Localization of video caption plays an important role in information retrieval in multimedia applications. In this work, we present and evaluate a novel method for localizing video captions using visual rhythms, which enable the representation and analysis of a specific feature throughout the time. We build visual rhythms from the text location maps produced by general text localization methods that are far more common in the literature than caption-oriented ones. Then, we process the maps properly to keep only the captions, generating caption localization masks. To meet the need for a standardized and large dataset, we constructed a new one, where captions with thirteen different scripts are added to the video frames, generating a total of 221 videos with ground truth. Experiments demonstrate that our method achieves competitive results when compared to other literature approaches. (AU)

FAPESP's process:	17/12646-3 - Déjà vu: feature-space-time coherence from heterogeneous data for media integrity analytics and interpretation of events
Grantee:	Anderson de Rezende Rocha
Support Opportunities:	Research Projects - Thematic Grants