Advanced search
Start date
Betweenand

Applying the Perseus Technique to Index Huge Nucleotide Sequences

Grant number: 09/15485-4
Support type:Scholarships in Brazil - Scientific Initiation
Effective date (Start): January 01, 2010
Effective date (End): December 31, 2011
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Cristina Dutra de Aguiar Ciferri
Grantee:Felipe Alves da Louza
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Abstract

It is being under development by the research group headed by the supervisor of this project the Perseus technique, a novel technique that handles persistent suffix trees. The Perseus introduces the following distinctive good properties. It is based on an approach that constructs persistent suffix trees whose sizes may exceed the main memory capacity. Furthermore, it provides an algorithm that allows for users to indicate which substrings of the input string should be indexed, according to the requirements of their applications. Moreover, it proposes an extended exact matching algorithm that searches for a query string into suffix trees that may be partitioned.This project aims at introducing extensions to the Perseus, allowing that this technique be used to index huge nucleotide sequences. In detail, this project aims at developing a strategy to use the Perseus when the memory required to store the string being indexed is larger than the main memory capacity. The project also aims at investigating the execution of approximate queries, in addition to perform experimental tests that make it possible to compare our work with related ones.