Advanced search
Start date
Betweenand


DBM-tree: metric access method sensitive to local density data

Full text
Author(s):
Marcos Rodrigues Vieira
Total Authors: 1
Document type: Master's Dissertation
Press: São Carlos.
Institution: Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:
Examining board members:
Caetano Traina Junior; Sergio Lifschitz; Luis Gustavo Nonato
Advisor: Caetano Traina Junior
Abstract

A metric space is defined as a set of objects and a metric distance function that is used to measure the similarity between these objects. It allows the development of Metric Access Methods (MAMs) that are able to answer similarity queries in these datasets quickly. Usually these MAMs are materialized through a hierarchical structure called metric trees. These trees are kept balanced because it tends to maintain the height of the tree small, aiming to reduce the number of disk access required to answer queries. However, it is difficult to maintain the tree balanced without overlapping nodes covering a large number of objects, leading to the degradation of query performance. In other words, reducing the overlap among nodes increases the performance of metric trees. A possible solution is to relax the need to keep metric trees balanced. This work presents a new dynamic MAM called DBM-tree (Density-Based Metric tree), which changes the rule that imposes a rigid balancing policy, allowing a small amount of unbalancing in some regions of it. This unbalancing minimizes the degree of overlapping among some high-density nodes and, consequently, increases query answering performance. The amount of relaxation is set by the user and is strongly enforced in the tree. The height of the tree is higher in high-density regions, in order to keep a balance between searching in various subtrees and searching deeply in each subtree. The DBM-tree has an optimization algorithm called DBM-Slim-Down that improves the performance in trees through reorganizing the elements among its nodes. The experiments performed over synthetic and real-world datasets showed that the DBM-tree outperforms the traditional MAMs. The DBM-tree is, in average, 50% faster than traditional MAMs and reduces the number of distance calculations and disk accesses up to 50%. After executing the DBM-Slim-Down algorithm, the performance achieves improvements up to 30% for range and k-nearest neighbor queries. Moreover, the DBM-tree is scalable regarding time, number of disk accesses and distance calculations. (AU)