Advanced search
Start date
Betweenand

Active learning in hierarchical classification of transposable elements

Grant number: 17/19264-9
Support type:Scholarships abroad - Research Internship - Master's degree
Effective date (Start): November 01, 2017
Effective date (End): April 30, 2018
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Ricardo Cerri
Grantee:Felipe Kenji Nakano
Supervisor abroad: Celine Vens
Home Institution: Centro de Ciências Exatas e de Tecnologia (CCET). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil
Local de pesquisa : University of Leuven, Kulak Kortrijk (KU Leuven), Belgium  
Associated to the scholarship:16/12489-2 - Deep learning for hierarchical classification of transposable elements, BP.MS

Abstract

Tranposable Elements (TEs) are DNA sequences capable of moving within a cell's genome. Such movement causes genetic variability, and changes in gene's functionality. Usually TEs classification is performed using homology tools. Homology tries to find similar sequences by matching then in a string like fashion, however, such method ignores many biochemical and hierarchical properties. Nonetheless, recently, TEs were proposed as Machine Learning (ML) classification problem. More specifically, TEs are classified using Hierarchical Classification(HC) methods. Differently from traditional classification, HC addresses problems whose classes are structured in a hierarchy. Such methods have proved to be more efficient and feasible than homology, however ML methods require labelled data. TEs' labelling is not an easytask. Repbase, the most academic received TEs repository, employs massive validation andmultiple tools for TEs classification. This process is computationally and financially demanding, resulting in plenty of unlabelled sequences. As a countermeasure, the field of Active Learning (AL) provides methods for using unlabelled data. Basically, an AL algorithm employs strategies that select the most valuable unlabelled data to be labelled. Hence the cost of labelling the data is reduced, and classifiers are likely to learn from the most representative instances. In this research, we plan to investigate AL algorithms for HC, in special, we will merge AL into the state-of-art method for HC, Clus-HMC.