Advanced search
Start date

Hierarchical classification of transposable elements using machine learning


Transposable Elements (TEs) are DNA sequences which can move from one place to another inside the genome of a cell. These elements contribute to the genetic diversity of species, and their transposition mechanisms may affect the functionality of genes. The correct identification and classification of these elements is useful for the comprehension of their effects in the genomes evolutionary process. TEs are organized in a hierarchical taxonomy, having different families and superfamilies of elements. Usually, the identification and classification of these elements is performed using Bioinformatics tools which use homology, comparing a new sequence with a dataset of many sequences which have previously identified TEs. Although this method is very used, it presents disadvantages, because homology between sequences ignores their many biochemical properties, and also the relationships between the different TE families and superfamilies. Thus, this project will investigate and propose different hierarchical classification methods for TEs using Machine Learning (ML) techniques. Different datasets will be constructed nucleotide and amino acid sequences with already previously identified TEs. For the construction of these datasets, Bioinformatics tools designed to extract biochemical characteristics from sequences will be used. Different strategies to convert sequences into attribute values adequate to be used in ML techniques will also be investigated. The datasets will then be hierarchically structured according to the TEs families and superfamilies which they belong to. The different classification methods proposed will be compared with existing literature methods, and evaluated using evaluation measures specifically proposed to hierarchical classification problems. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
CERRI, RICARDO; BASGALUPP, MARCIO P.; BARROS, RODRIGO C.; DE CARVALHO, ANDRE C. P. L. F.. Inducing Hierarchical Multi-label Classification rules with Genetic Algorithms. APPLIED SOFT COMPUTING, v. 77, p. 584-604, . (16/50457-5, 15/14300-1)
SCHIETGAT, LEANDER; VENS, CELINE; CERRI, RICARDO; FISCHER, CARLOS N.; COSTA, EDUARDO; RAMON, JAN; CARARETO, CLAUDIA M. A.; BLOCKEEL, HENDRIK. A machine learning based framework to identify and classify long terminal repeat retrotransposons. PLOS COMPUTATIONAL BIOLOGY, v. 14, n. 4, . (15/14300-1, 13/15070-4, 12/24774-2)
CERRI, RICARDO; BARROS, RODRIGO C.; DE CARVALHO, ANDRE C. P. L. F.; JIN, YAOCHU. Reduction strategies for hierarchical multi-label classification in protein function prediction. BMC Bioinformatics, v. 17, . (15/14300-1)

Please report errors in scientific publications list by writing to: