Advanced search
Start date

Visual question answering task with graph convolution networks

Grant number: 20/14452-4
Support type:Scholarships in Brazil - Master
Effective date (Start): May 01, 2021
Effective date (End): February 28, 2023
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal researcher:Gerberth Adín Ramírez Rivera
Grantee:Bruno César de Oliveira Souza
Home Institution: Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil


Visual Question Answering (VQA) is a task that aims to answer a user's question grounded to a given image. Normally, this task requires a combination of concepts from Computer Vision and Natural Language Processing. The majority of existing VQA systems merge the extracted image and question features in order to predict an answer. Nonetheless, this multi-modal fusion shows a significant gap in semantic understanding between the relationship of the image and the question. To perform a better holistic understanding of the scene, we propose to apply a graph-based approach combining the question features related to the input image. The main objective of our research is to provide advancements in visual question answering, by using the structure of graph representation that improves the connections between features. For this purpose, it is necessary to create architectures to attain a graph representation that encodes the feature from the image's content, the natural language question, and their relationships. Then, we intend to use a graph neural network (GNN) that will learn the relationship of the VQA graph representation between a specific question grounded on the input image, in order to output the correct predicted answer. Finally, to bring more `reason' to our proposal, we aim to use the novel `fact-based' visual question answering (FVQA) task. A `fact-based' approach provides the model with a candidate list of facts related to the question. The method receives the `fact' through a knowledge base (KB) approach extracted from different sources of information. (AU)