Fast Local Spatial Verification for Feature-Agnostic Large-Scale Image Retrieval

Brogan, Joel; Bharati, Aparna; Moreira, Daniel; Rocha, Anderson; Bowyer, Kevin W.; Flynn, Patrick J.; Scheirer, Walter J.

Full text
Author(s):	Brogan, Joel ^[1] ; Bharati, Aparna ^[2] ; Moreira, Daniel ^[3] ; Rocha, Anderson ^[4] ; Bowyer, Kevin W. ^[3] ; Flynn, Patrick J. ^[3] ; Scheirer, Walter J. ^[3] Total Authors: 7
Affiliation:	^[1] Oak Ridge Natl Lab, Multimodal Sensor Analyt MSA Grp, POB 2009, Oak Ridge, TN 37830 - USA ^[2] Lehigh Univ, Dept Comp Sci & Engn, Bethlehem, PA 18015 - USA ^[3] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 - USA ^[4] Univ Estadual Campinas, Inst Comp, BR-13083970 Campinas, SP - Brazil Total Affiliations: 4
Document type:	Journal article
Source:	IEEE Transactions on Image Processing; v. 30, p. 6892-6905, 2021.
Web of Science Citations:	0
Abstract
Images from social media can reflect diverse viewpoints, heated arguments, and expressions of creativity, adding new complexity to retrieval tasks. Researchers working on Content-Based Image Retrieval (CBIR) have traditionally tuned their algorithms to match filtered results with user search intent. However, we are now bombarded with composite images of unknown origin, authenticity, and even meaning. With such uncertainty, users may not have an initial idea of what the search query results should look like. For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition. It is pertinent to design systems that retrieve images with these nuanced relationships in addition to providing more traditional results, such as duplicates and near-duplicates - and to do so with enough efficiency at large scale. We propose a new approach for spatial verification that aims at modeling object-level regions using image keypoints retrieved from an image index, which is then used to accurately weight small contributing objects within the results, without the need for costly object detection steps. We call this method the Objects in Scene to Objects in Scene (OS2OS) score, and it is optimized for fast matrix operations, which can run quickly on either CPUs or GPUs. It performs comparably to state-of-the-art methods on classic CBIR problems (Oxford 5K, Paris 6K, and Google-Landmarks), and outperforms them in emerging retrieval tasks such as image composite matching in the NIST MFC2018 dataset and meme-style imagery from Reddit. (AU)

FAPESP's process:	17/12646-3 - Déjà vu: feature-space-time coherence from heterogeneous data for media integrity analytics and interpretation of events
Grantee:	Anderson de Rezende Rocha
Support Opportunities:	Research Projects - Thematic Grants

Short URL