Multi-Scale Patch Partitioning for Image Inpainting Based on Visual Transformers

Campana, Jose Luis Flores; Decker, Luis Gustavo Lorgus; Roberto e Souza, Marcos; Maia, Helena de Almeida; Pedrini, Helio; DeCarvalho, BM; Goncalves, LMG

Full text
Author(s):	Campana, Jose Luis Flores ; Decker, Luis Gustavo Lorgus ; Roberto e Souza, Marcos ; Maia, Helena de Almeida ; Pedrini, Helio ; DeCarvalho, BM ; Goncalves, LMG Total Authors: 7
Document type:	Journal article
Source:	2022 35TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2022); v. N/A, p. 6-pg., 2022-01-01.
Abstract
Image inpainting is a challenging task that aims to reconstruct missing pixels with semantically coherent content and realistic texture using available information. Modern inpainting works rely on neural networks to generate realistic images. However, due to their limited receptive field in convolution operators, they may produce distorted content when a large region needs to be filled. Recent methods have employed transformers to deal with this problem, but their high computational cost makes it difficult to work with global image information. To address this, we propose a multi-scale patch partitioning strategy to subdivide feature maps into non-overlapping patches, and a transformer with a variable number of heads to control the computational cost growth according to the number of patches. Smaller patches enable a broader image coverage, helping to recover structural information, whereas larger patches lead to a reduced computational cost. In contrast to the fixed and small sizes employed in other literature methods, here we explore different patch sizes in the transformer blocks to achieve a good balance between the computational cost and the number of pixel references used in the reconstruction. Extensive experiments on three datasets show that our method achieves very competitive results compared to the state of the art, reaching the best scores in various scenarios, especially for metrics based on human perception. Moreover, our model presented the smallest size. Our qualitative results suggest that the proposed method is able to reconstruct structural content such as parts of human faces. (AU)

FAPESP's process:	17/12646-3 - Déjà vu: feature-space-time coherence from heterogeneous data for media integrity analytics and interpretation of events
Grantee:	Anderson de Rezende Rocha
Support Opportunities:	Research Projects - Thematic Grants

Short URL