Advanced search
Start date
Betweenand

Suffix sorting and string similarity measures

Grant number: 17/09105-0
Support Opportunities:Scholarships in Brazil - Post-Doctoral
Start date: August 01, 2017
End date: August 13, 2019
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Zhao Liang
Grantee:Felipe Alves da Louza
Host Institution: Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto (FFCLRP). Universidade de São Paulo (USP). Ribeirão Preto , SP, Brazil
Associated research grant:15/50122-0 - Dynamic phenomena in complex networks: basics and applications, AP.TEM
Associated scholarship(s):18/21509-2 - Burrows-Wheeler transform and succinct De Bruijn graphs, BE.EP.PD

Abstract

Suffix sorting is a fundamental problem in string processing that arises in many applications of Bioinformatics, Information Retrieval and Text Mining. This problem is related to the suffix array construction and the Burrows-Wheeler transform, which play an important role in text indexing (e.g. FM-index) and data compression (e.g. bzip). Several algorithms have been proposed to construct the suffix array for a single string and to obtain the Burrows-Wheeler transform. However, in many situations we are interested in processing a string collection and obtain directly the Burrows-Wheeler transform, that is, without constructing the suffix array. Less emphasis has been put on specific algoritmos for this problems. Therefore, the main goal of this project is to investigate the direct computation of the Burrows-Wheeler transform for a collection of strings. Moreover, we will investigate string similarity measures that use the Burrows-Wheeler transform to compare and classify strings.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (8)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
EGIDI, LAVINIA; LOUZA, FELIPE A.; MANZINI, GIOVANNI. Space Efficient Merging of de Bruijn Graphs and Wheeler Graphs. ALGORITHMICA, . (17/09105-0, 18/21509-2)
LOUZA, FELIPE A.; TELLES, GUILHERME P.; GOG, SIMON; ZHAO, LIANG. Algorithms to compute the Burrows-Wheeler Similarity Distribution. THEORETICAL COMPUTER SCIENCE, v. 782, p. 145-156, . (17/09105-0, 18/21509-2, 15/50122-0)
LOUZA, FELIPE A.; TELLES, GUILHERME P.; HOFFMANN, STEVE; CIFERRI, CRISTINA D. A.. Generalized enhanced suffix array construction in external memory. Algorithms for Molecular Biology, v. 12, . (11/23904-7, 17/09105-0, 11/15423-9)
NOGUEIRA NUNES, DANIEL SAAD; LOUZA, FELIPE A.; GOG, SIMON; AYALA-RINCON, MAURICIO; NAVARRO, GONZALO; BILGIN, A; MARCELLIN, MW; SERRASAGRISTA, J; STORER, JA. A Grammar Compression Algorithm based on Induced Suffix Sorting. 2018 DATA COMPRESSION CONFERENCE (DCC 2018), v. N/A, p. 10-pg., . (17/09105-0)
EGIDI, LAVINIA; LOUZA, FELIPE A.; MANZINI, GIOVANNI; TELLES, GUILHERME P.. External memory BWT and LCP computation for sequence collections with applications. Algorithms for Molecular Biology, v. 14, . (17/09105-0, 18/21509-2)
LOUZA, FELIPE A.; SMYTH, W. F.; MANZINIC, GIOVANNI; TELLES, GUILHERME P.. Lyndon array construction during Burrows-Wheeler inversion. JOURNAL OF DISCRETE ALGORITHMS, v. 50, p. 2-9, . (17/09105-0)
LOUZA, FELIPE A.; MHASKAR, NEERJA; SMYTH, W. F.. A new approach to regular & indeterminate strings. THEORETICAL COMPUTER SCIENCE, v. 854, p. 105-115, . (17/09105-0)
LOUZA, FELIPE A.; TELLES, GUILHERME P.; GOG, SIMON; ZHAO, LIANG; GAGIE, T; MOFFAT, A; NAVARRO, G; CUADROSVARGAS, E. Computing Burrows-Wheeler Similarity Distributions for String Collections. STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2018, v. 11147, p. 12-pg., . (17/09105-0)