Advanced search
Start date
Betweenand


ORTree: Tuning Diversified Similarity Queries by Means of Data Partitioning

Full text
Author(s):
de Oliveira Novaes, Joao Victor ; Dutra Santos, Lucio Fernandes ; Traina, Agma Juci Machado ; Traina, Caetano, Jr. ; Chiusano, S ; Cerquitelli, T ; Wrembel, R
Total Authors: 7
Document type: Journal article
Source: ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2022; v. 13389, p. 14-pg., 2022-01-01.
Abstract

As modern applications gather more and more data, the data types also become more complex. Traditional retrieval operations based on identity and order comparisons are not suitable for those types. Instead, similarity operators are much more interesting for querying complex data and are gaining increasing attention. Similarity queries retrieve the elements most similar to a query center but, they tend to return elements that are very similar to others in the result set, reducing users' interest in the answer. To overcome this problem, researchers have considered incorporating a diversity degree in the similarity operators. Unfortunately, diversified similarity queries are computationally expensive, as they need to assess the relationship between each pair of elements in the result. Several works in the literature present techniques to speed up diversity in similarity queries, but they are either not scalable or only consider the diversity property. In this paper, we propose an index data structure, called the Omni-Range Tree (ORTree), that partitions the query space into a small subset of similar elements to a query element and prospect representative candidates aiming at dispatch diversified similarity queries. Our experimental evaluation shows that our index structure can reduce the query execution by time up to 95% without harming the quality of the results concerning other literature methods. (AU)

FAPESP's process: 16/17078-0 - Mining, indexing and visualizing Big Data in clinical decision support systems (MIVisBD)
Grantee:Agma Juci Machado Traina
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 20/07200-9 - Analyzing complex data from COVID-19 to support decision making and prognosis
Grantee:Agma Juci Machado Traina
Support Opportunities: Regular Research Grants