Advanced search
Start date
Betweenand

Develoment of a multiple alignment system using GPHMMs

Grant number: 10/04409-2
Support Opportunities:Scholarships in Brazil - Master
Start date: August 01, 2010
End date: July 31, 2012
Field of knowledge:Biological Sciences - Biochemistry - Molecular Biology
Principal Investigator:Alan Mitchell Durham
Grantee:Vitor Ferreira Onuchic
Host Institution: Instituto de Matemática e Estatística (IME). Universidade de São Paulo (USP). São Paulo , SP, Brazil

Abstract

Multiple sequence alignment is a fundamental step for the study of molecular phylogenies and the study of genomes. The results of the alignments, however, generally require manual editing. This is due to many factors, in particular the homogeneous scoring of pairings, deletions and insertions along the sequences. To ameliorate this problem, Goldman and Löytynoja proposed modifications in the scoring of insertions and deletions and the use of the pair-HMM probabilistic model, describing separately coding regions and slow and fast evolving regions. This achieved substantial improvements in the quality of the alignments. However the model still presents some problems. In particular the states of the pair-HMM machinery can not model adequately the length of the coding regions. In the scope of gene prediction, this problem was solved using Generalized HMMs (GHMMs), where the runlength of each state can be modeled with an arbitrary distribution. Recently Pachter proposed a generalization of GHMMs for aligning sequence pairs (GPHMMs). This research project aims to use GPHMMs to improve multiple aligments, generalizing the approach proposed by Goldman and Löytynoja. This will be implemented initally extending ToPS, a framework developed by our group for generic implementation of GHMMs for modeling ab initio gene predictors. After we will embed the GPHMMs generated in a multiple alignment platform. In a third step this platform will be used to improve automatic multiple sequence alignments. In spite of the fact that the use of GPHMMs for gene prediction can be found in the literature, its use in multiple sequence alignment is, as far as we are aware, new.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
KASHIWABARA, ANDRE YOSHIAKI; BONADIO, IGOR; ONUCHIC, VITOR; AMADO, FELIPE; MATHIAS, RAFAEL; DURHAM, ALAN MITCHELL. ToPS: A Framework to Manipulate Probabilistic Models of Sequence Data. PLOS COMPUTATIONAL BIOLOGY, v. 9, n. 10, . (10/04409-2)
KASHIWABARA, ANDRE YOSHIAKI; BONADIO, IGOR; ONUCHIC, VITOR; AMADO, FELIPE; MATHIAS, RAFAEL; DURHAM, ALAN MITCHELL. ToPS: A Framework to Manipulate Probabilistic Models of Sequence Data. PLOS COMPUTATIONAL BIOLOGY, v. 9, n. 10, p. 10-pg., . (10/04409-2)
Academic Publications
(References retrieved automatically from State of São Paulo Research Institutions)
ONUCHIC, Vitor Ferreira. Inovações em técnicas de alinhamentos múltiplos e predições de genes. 2012. Master's Dissertation - Universidade de São Paulo (USP). Instituto de Matemática e Estatística (IME/SBI) São Paulo.