Multimodal Representation Space for Text-Guided Data Generation
The Empire on images: the Expo'98 and the iconographic construction of the Portugu...
The reverse route of discoveries: the concept of Brazilianness in Portuguese newsp...
Full text | |
Author(s): |
Veltroni, Wellington Cristiano
;
Caseli, Helena de Medeiros
;
Villavicencio, A
;
Moreira, V
;
Abad, A
;
Caseli, H
;
Gamallo, P
;
Ramisch, C
;
Oliveira, HG
;
Paetzold, GH
Total Authors: 10
|
Document type: | Journal article |
Source: | COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018; v. 11122, p. 11-pg., 2018-01-01. |
Abstract | |
Text-image alignment is the task of aligning elements in a text with elements in the image accompanying it. Text-image alignment can be applied, for example, in news articles to improve clarity by explicitly defining the correspondence between regions in the article's image and words or named entities in the article's text. It can also be an useful step in many multimodal applications such as image captioning or image description/comprehension. In this paper we present the LinkPICS: an automatic aligner which combines Natural Language Processing (NLP) and Computer Vision (CV) techniques to explicitly define the correspondence between regions of an image (bounding boxes) and elements (words or named entities) in a text. LinkPICS performs the alignment of people and objects (or animals, vehicles, etc.) as two distinct processes. In the experiments present in this paper, LinkPICS obtained a precision of 97% in the alignment of people and 73% in the alignment of objects in articles in Portuguese from a Brazilian news site. (AU) | |
FAPESP's process: | 16/13002-0 - MMeaning - multimodal distributional semantic models |
Grantee: | Helena de Medeiros Caseli |
Support Opportunities: | Regular Research Grants |