Advanced search
Start date
Betweenand


Cross Domain Visual Search with Feature Learning using Multi-stream Transformer-based Architectures

Full text
Author(s):
Leo Sampaio Ferraz Ribeiro
Total Authors: 1
Document type: Doctoral Thesis
Press: São Carlos.
Institution: Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:
Examining board members:
Moacir Antonelli Ponti; José Manuel Saavedra Rondo; Diego Furtado Silva; Ricardo da Silva Torres
Advisor: Moacir Antonelli Ponti
Abstract

Within the general field of Computer Vision, the task of Cross-domain Visual Search is one of the most useful and studied and yet it is rarely seen throughout our daily lives. In this thesis we explore Cross-domain Visual Search using the specific and mature Sketch-based Image Retrieval (SBIR) task as a canvas. We draw four distinct hypothesis as to how to further the field and demonstrate their validity with each contribution. First we present a new architecture for sketch representation learning that forgoes traditional Convolutional networks in favour of the recent Transformer design, called Sketchformer. Then we explore two alternative definitions for the SBIR task that each approach the scale and generalisation necessary for implementation in the real world. For both tasks we introduce state-of-the-art models: our Scene Designer combines traditional multi-stream networks with a Graph Neural Network to learn representations for sketched scenes with multiple object; our Sketch-an-Anchor shows that it is possible to harvest general knowledge from pre-trained models for the Zero-shot SBIR task. These contributions have a direct impact on the literature of sketch-based tasks and a cascaded impact on Image Undestanding and Cross-domain representations at large. (AU)

FAPESP's process: 17/22366-8 - Generative networks and feature learning for cross domain visual search
Grantee:Leo Sampaio Ferraz Ribeiro
Support Opportunities: Scholarships in Brazil - Doctorate (Direct)