Busca avançada
Ano de início
Entree


The Effects of Unimodal Representation Choices on Multimodal Learning

Autor(es):
Mostrar menos -
Ito, Fernando Tadao ; Caseli, Helena de Medeiros ; Moreira, Jander ; Declerck, T ; Calzolari, N ; Choukri, K ; Cieri, C ; Hasida, K ; Isahara, H ; Maegaard, B ; Mariani, J ; Moreno, A ; Odijk, J ; Piperidis, S ; Tokunaga, T ; Goggi, S ; Mazo, H
Número total de Autores: 17
Tipo de documento: Artigo Científico
Fonte: PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018); v. N/A, p. 8-pg., 2018-01-01.
Resumo

Multimodal representations are distributed vectors that map multiple modes of information to a single mathematical space, where distances between instances delineate their similarity. In most cases, using a single unimodal representation technique is sufficient for each mode in the creation of multimodal spaces. In this paper, we investigate how different unimodal representations can be combined, and argue that the way they are combined can affect the performance, representation accuracy and classification metrics of other multimodal methods. In the experiments present in this paper, we used a dataset composed of images and text descriptions of products that have been extracted from an e-commerce site in Brazil. From this dataset, we tested our hypothesis in common classification problems to evaluate how multimodal representations can differ according to their component unimodal representation methods. For this domain, we selected eight methods of unimodal representation: LSI, LDA, Word2Vec, GloVe for text; SIFT, SURF, ORB and VGG19 for images. Multimodal representations were built by a multimodal deep autoencoder and a bidirectional deep neural network. (AU)

Processo FAPESP: 16/13002-0 - MMeaning - representação semântica distribuída multimodal
Beneficiário:Helena de Medeiros Caseli
Modalidade de apoio: Auxílio à Pesquisa - Regular