Multi-Scale Patch Partitioning for Image Inpainting Based on Visual Transformers

Campana, Jose Luis Flores; Decker, Luis Gustavo Lorgus; Roberto e Souza, Marcos; Maia, Helena de Almeida; Pedrini, Helio; DeCarvalho, BM; Goncalves, LMG

Texto completo
Autor(es):	Campana, Jose Luis Flores ; Decker, Luis Gustavo Lorgus ; Roberto e Souza, Marcos ; Maia, Helena de Almeida ; Pedrini, Helio ; DeCarvalho, BM ; Goncalves, LMG Número total de Autores: 7
Tipo de documento:	Artigo Científico
Fonte:	2022 35TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2022); v. N/A, p. 6-pg., 2022-01-01.
Resumo
Image inpainting is a challenging task that aims to reconstruct missing pixels with semantically coherent content and realistic texture using available information. Modern inpainting works rely on neural networks to generate realistic images. However, due to their limited receptive field in convolution operators, they may produce distorted content when a large region needs to be filled. Recent methods have employed transformers to deal with this problem, but their high computational cost makes it difficult to work with global image information. To address this, we propose a multi-scale patch partitioning strategy to subdivide feature maps into non-overlapping patches, and a transformer with a variable number of heads to control the computational cost growth according to the number of patches. Smaller patches enable a broader image coverage, helping to recover structural information, whereas larger patches lead to a reduced computational cost. In contrast to the fixed and small sizes employed in other literature methods, here we explore different patch sizes in the transformer blocks to achieve a good balance between the computational cost and the number of pixel references used in the reconstruction. Extensive experiments on three datasets show that our method achieves very competitive results compared to the state of the art, reaching the best scores in various scenarios, especially for metrics based on human perception. Moreover, our model presented the smallest size. Our qualitative results suggest that the proposed method is able to reconstruct structural content such as parts of human faces. (AU)

Processo FAPESP:	17/12646-3 - Déjà vu: coerência temporal, espacial e de caracterização de dados heterogêneos para análise e interpretação de integridade
Beneficiário:	Anderson de Rezende Rocha
Modalidade de apoio:	Auxílio à Pesquisa - Temático

URL curto