Busca avançada
Ano de início
Entree


Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data

Texto completo
Autor(es):
Silva, Eliezer ; Teixeira, Thiago ; Teodoro, George ; Valle, Eduardo ; Traina, AJM ; Traina, C ; Cordeiro, RLF
Número total de Autores: 7
Tipo de documento: Artigo Científico
Fonte: SIMILARITY SEARCH AND APPLICATIONS; v. 8821, p. 12-pg., 2014-01-01.
Resumo

Locality-Sensitive Hashing (LSH) is extremely competitive for similarity search, but works under the assumption of uniform access cost to the data, and for just a handful of dissimilarities for which locality-sensitive families are available. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributed-memory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features. (AU)

Processo FAPESP: 10/05647-4 - Computação forense e criminalística de documentos: coleta, organização, classificação e análise de evidências
Beneficiário:Anderson de Rezende Rocha
Modalidade de apoio: Auxílio à Pesquisa - Jovens Pesquisadores