Advanced search
Start date
Betweenand


The Effects of Unimodal Representation Choices on Multimodal Learning

Author(s):
Show less -
Ito, Fernando Tadao ; Caseli, Helena de Medeiros ; Moreira, Jander ; Declerck, T ; Calzolari, N ; Choukri, K ; Cieri, C ; Hasida, K ; Isahara, H ; Maegaard, B ; Mariani, J ; Moreno, A ; Odijk, J ; Piperidis, S ; Tokunaga, T ; Goggi, S ; Mazo, H
Total Authors: 17
Document type: Journal article
Source: PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018); v. N/A, p. 8-pg., 2018-01-01.
Abstract

Multimodal representations are distributed vectors that map multiple modes of information to a single mathematical space, where distances between instances delineate their similarity. In most cases, using a single unimodal representation technique is sufficient for each mode in the creation of multimodal spaces. In this paper, we investigate how different unimodal representations can be combined, and argue that the way they are combined can affect the performance, representation accuracy and classification metrics of other multimodal methods. In the experiments present in this paper, we used a dataset composed of images and text descriptions of products that have been extracted from an e-commerce site in Brazil. From this dataset, we tested our hypothesis in common classification problems to evaluate how multimodal representations can differ according to their component unimodal representation methods. For this domain, we selected eight methods of unimodal representation: LSI, LDA, Word2Vec, GloVe for text; SIFT, SURF, ORB and VGG19 for images. Multimodal representations were built by a multimodal deep autoencoder and a bidirectional deep neural network. (AU)

FAPESP's process: 16/13002-0 - MMeaning - multimodal distributional semantic models
Grantee:Helena de Medeiros Caseli
Support Opportunities: Regular Research Grants