Aprendizado não supervisionado para recuperação multimídia multimodal

Lucas Barbosa de Almeida

Full text
Author(s):	Lucas Barbosa de Almeida Total Authors: 1
Document type:	Master's Dissertation
Press:	Rio Claro. 2022-05-13.
Institution:	Universidade Estadual Paulista (Unesp). Instituto de Geociências e Ciências Exatas. Rio Claro
Defense date:	2022-03-30
Advisor:	Daniel Carlos Guimarães Pedronette
Abstract
Given the rapid growth of multimedia collections, whether videos, audios or images, and the lack of labeled data, it is essential to investigate unsupervised approaches to content-based information retrieval. Considering that information from different modalities or representations of the same object tend to be complementary, it is essential to explore multiple modalities in the information retrieval process. However, when using information from different modalities, one is faced with the challenge of how to combine information from these different sources. In the context of this dissertation, combination approaches using multiple rankings through unsupervised learning methods will be investigated. In general, such methods explore contextual relationships between objects, usually encoded in the similarity information of the collections, without requiring labeled data or user intervention. Furthermore, recent approaches to graph-based convolutional networks (\textit{Graph Convolutional Networks} - GCNs) were considered. The training of GCNs is traditionally performed so that each node communicates with its neighborhood, incorporating information from the nodes to which it has connections in the graph. In this work, we combine the ability of unsupervised learning methods to explore the geometry of the dataset and define a contextual measure of distance with the ability of GCNs to create a more effective representation of each instance to improve video retrieval results in unsupervised and multimodal scenarios. In this way, the work presents a bibliographic survey, discusses methods for extracting features in different modalities, and presents proposals for methods for multimedia retrieval capable of combining information from different modalities in two different scenarios. In the first scenario, different approaches are proposed for video retrieval considering information from different modalities (images, audios and videos) and using unsupervised learning techniques based on ranking and unsupervised trained GCNs. In the second scenario, a representation learning method for image retrieval based on the fusion of multimodal representations is proposed. The representation of each image is obtained through features extracted from a sequence composed of its nearest $k$-neighborhood, also using unsupervised learning techniques. (AU)

FAPESP's process:	20/03311-0 - Unsupervised learning for general and multimodal multimedia retrieval
Grantee:	Lucas Barbosa de Almeida
Support Opportunities:	Scholarships in Brazil - Master

Short URL