Advanced search
Start date
Betweenand


Complex similarity queries in relational database management systems

Full text
Author(s):
Adriano Siqueira Arantes
Total Authors: 1
Document type: Doctoral Thesis
Press: São Carlos.
Institution: Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:
Examining board members:
Caetano Traina Junior; Mauro Biajiz; Alberto Henrique Frade Laender; Marta Lima de Queirós Mattoso; Luis Gustavo Nonato
Advisor: Caetano Traina Junior
Abstract

The similarity among elemcnts emerges naturally as the most adequate to ask about complex data (such as, multimédia and genomic sequenees among others). There are two basic similarity queries: Range Query and k-Nearest Neighbor Query. The increasing volume of complex data stored in Database Management Systems (DBMS), makes it neeessary to provide support for these data tvpes. One way to support complex data types in current DBMS is to include similarity queries in its query processor, and consequently, in the relational algebra. This fact leads to produce ways to express such queries in the DBMS language as predicates in select operations. As a consequence, the two basic similarity queries can be combined in more complex expressions involving boolean conjunctions and disjunctions among them, i.e., complex similarity queries. However, for complex similarity queries to be processed efficiently in a DBMS, it is necessary to provide support in the optimization and runtime laycrs of the; query proeessing. There are many works involving the development, of algorithms to answer specific and simple similarity query whereas there is not a generic algorithm efficiently able to handle complex similarity queries. Furthermore, the similarity query optimization is a topic not frequently explored in the literature. This work establishes a structured rnethod 011 how to analyze complex similarity queries. This method is used to extend the relational algebra through algebraic rules and to determine a small set of algorithms that can be used to answer any complex similarity query. In addition, the proposed method makes it possible to formalize rules for selectivity estimation of these. queries thus assisting cost estimation. To validate the concepts presented, experiments are being performed on real and synthetic data sets that highlight meaningful results. The algebraic rules. algorithms and metrics to estimate the selectivity can be employed in the optimization process of a DBMS in order to derive efficient complex similarity query execution plans. Therefore, this work deals with essential poiuts that enable the practical use of similarity (jueries in Relational Database Management Systems. (AU)