Advanced search
Start date
Betweenand


Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data

Full text
Author(s):
Silva, Eliezer ; Teixeira, Thiago ; Teodoro, George ; Valle, Eduardo ; Traina, AJM ; Traina, C ; Cordeiro, RLF
Total Authors: 7
Document type: Journal article
Source: SIMILARITY SEARCH AND APPLICATIONS; v. 8821, p. 12-pg., 2014-01-01.
Abstract

Locality-Sensitive Hashing (LSH) is extremely competitive for similarity search, but works under the assumption of uniform access cost to the data, and for just a handful of dissimilarities for which locality-sensitive families are available. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributed-memory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features. (AU)

FAPESP's process: 10/05647-4 - Digital forensics: collection, organization, classification and analysis of digital evidences
Grantee:Anderson de Rezende Rocha
Support Opportunities: Research Grants - Young Investigators Grants