Advanced search
Start date
Betweenand


Clustering complex data for processing constrained similarity queries

Full text
Author(s):
Jessica Andressa de Souza
Total Authors: 1
Document type: Doctoral Thesis
Press: São Carlos.
Institution: Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:
Examining board members:
Agma Juci Machado Traina; Joaquim Cezar Felipe; Marcela Xavier Ribeiro; Elaine Parros Machado de Sousa
Advisor: Agma Juci Machado Traina
Abstract

Due to the technological advances over the last years, both the amount and variety of data available have been increased at a fast pace. Thus, this scenario has influenced the development of effective strategies for the processing, summarizing, as well as to provide fast and automatic understanding of such data. The Access Methods are strategies that have been explored by researchers in the area to aid these purposes. These methods aim to effectively index data to reduce the time required for processing similarity querying. In addition, they have been applied to aid the processing of Data Mining techniques, such as Clustering Detection. Among the access methods, the metric structures are constructed applying only the criterion based on the distance computation between the elements of the dataset, i.e. similarity operations on the intrinsic characteristics of the dataset. Thus, the results do not always correspond to the context desired by users. This work explored the development of algorithms that allow metric access methods to process queries with a higher semantic load, aimed at contributing to the treatment of the quality question on the results of approaches that involve similarity operation (for example, data mining techniques and similarity queries). In this context, three approaches have been developed: the first approach presents the method clusMAM (Unsupervised Clustering using Metric Access Methods), which aims to display a clustering from a dataset with the application of a Metric Access Method from a summarized set. The second approach presents the CCkNN approach to dealing with the problem of multi-class constraints on the search space. Finally, the third proposal presents the method CfQ (Clustering for Querying) by integrating the techniques clusMAM with CCkNN, using the positive points of each strategy applied by the algorithms. In general, the experiments carried out showed that the proposed methods can contribute to an effective way of reducing similarity computations, which is required during a processing of techniques that are based on distance computations. (AU)

FAPESP's process: 13/21378-1 - Study and Development of Metric Access Methods Using Semantic Data Grouping
Grantee:Jessica Andressa de Souza
Support Opportunities: Scholarships in Brazil - Doctorate