Advanced search
Start date

Development of techniques for similarity retrieval of complex data in relational database management systems

Grant number: 14/26678-6
Support type:Regular Research Grants
Duration: May 01, 2015 - April 30, 2017
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Caetano Traina Junior
Grantee:Caetano Traina Junior
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Assoc. researchers:Agma Juci Machado Traina ; Elaine Parros Machado de Sousa ; José Fernando Rodrigues Júnior ; Luciana Alvim Santos Romani ; Paulo Mazzoncini de Azevedo Marques ; Robson Leonardo Ferreira Cordeiro


The Relational Database Management Systems (DBMS) were conceived to meet the needs of storing and retrieving large volumes of data, where each item is represented as a series of numbers, dates and small character strings, referred to as ``scalar data''. As the information technology evolves, it is increasingly required to organize, store and retrieve also other data types, such as images, videos, time series, genomic sequences, etc., referred to as ``complex data''. Scalar data are adequately compared using identity or ordering relations, but they are of little help to compare complex data. For complex data, similarity-based queries are the best option. However, similarity-comparison is not yet available in current DBMS. This project focuses on the development of techniques to provide relational DBMSs with the resources required to handle complex data by similarity, covering the needs of all the main DBMS modules, including: a) extending the SQL language to represent similarity queries; b) enacting a unified definition for algebraic operators that perform comparisons based on identity, order and similarity; c) developing techniques for the logical and physical optimization of query plans; and d): developing efficient indexing and retrieval techniques based on the combinations of identity-, order- and similarity-based comparison operators. The knowledge and technology generated have potential to be applied to many areas of human activity. However, the project will be validated on applications in medicine such as on computer-aided medical diagnostic, and in studies of climate models, benefiting from activities that the GBdI-ICMC-USP already develops. Thus, besides the great potential for innovation in information technology, applying them in those areas will bring immediate benefits to the entire population. (AU)