Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

A machine learning based framework to identify and classify long terminal repeat retrotransposons

Texto completo
Autor(es):
Schietgat, Leander [1] ; Vens, Celine [2, 1, 3, 4] ; Cerri, Ricardo [5] ; Fischer, Carlos N. [6] ; Costa, Eduardo [7, 1] ; Ramon, Jan [1, 8] ; Carareto, Claudia M. A. [9] ; Blockeel, Hendrik [1]
Número total de Autores: 8
Afiliação do(s) autor(es):
[1] Katholieke Univ Leuven, Dept Comp Sci, Leuven - Belgium
[2] KU Leuven Kulak, Dept Publ Hlth & Primary Care, Kortrijk - Belgium
[3] Univ Ghent, Dept Resp Med, Ghent - Belgium
[4] VIB Inflammat Res Ctr, Ghent - Belgium
[5] UFSCar Fed Univ Sao Carlos, Dept Comp Sci, Sao Carlos, SP - Brazil
[6] UNESP Sao Paulo State Univ, Dept Stat Appl Math & Comp Sci, Rio Claro, SP - Brazil
[7] Univ Sao Paulo, Inst Ciencias Matemat & Computacao, Sao Carlos, SP - Brazil
[8] INRIA, Lille Nord Europe, 40 Ave Halley, F-59650 Villeneuve Dascq - France
[9] UNESP Sao Paulo State Univ, Dept Biol, Sao Jose Do Rio Preto, SP - Brazil
Número total de Afiliações: 9
Tipo de documento: Artigo Científico
Fonte: PLOS COMPUTATIONAL BIOLOGY; v. 14, n. 4 APR 2018.
Citações Web of Science: 2
Resumo

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-LEARNER, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: REPEATMASKER, CENSOR and LTRDIGEST. In contrast to these methods, TE-LEARNER is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance , while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-LEARNER'S predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE. (AU)

Processo FAPESP: 15/14300-1 - Classificação hierárquica de elementos transponíveis utilizando aprendizado de máquina
Beneficiário:Ricardo Cerri
Linha de fomento: Auxílio à Pesquisa - Regular
Processo FAPESP: 12/24774-2 - Aplicação de Modelos Ocultos de Markov a elementos transponíveis
Beneficiário:Carlos Norberto Fischer
Linha de fomento: Bolsas no Exterior - Pesquisa
Processo FAPESP: 13/15070-4 - Mobilômica integrada em Coffea e seu inseto praga mais importante: a broca-do-café
Beneficiário:Claudia Marcia Aparecida Carareto
Linha de fomento: Auxílio à Pesquisa - Regular