Advanced search
Start date
Betweenand

Treatment of imbalanced data for lncRNA classification

Grant number: 18/03853-8
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Start date: March 01, 2018
End date: December 31, 2018
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:André Carlos Ponce de Leon Ferreira de Carvalho
Grantee:Jonas Coelho Kasmanas
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated research grant:13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry, AP.CEPID

Abstract

Recent years have contributed to increased interest in studying long non-coding RNAs (lncRNA). With great consistency, lncRNA has shown its importance on several types of genomic regulation, and, therefore, it is correlated with the development of several biological processes and diseases - among them cancer. To better understand the functions of this kind of RNA and their mechanisms, several studies have applied classification algorithms regarding the different types of lncRNAs, which is based on their genomic location and mode of action. There are, however, some limitations that may undermine the efficiency of automated classification methods, among which are the imbalance of classes in the original data set. The traditional algorithms could present great difficulty to correctly classify examples of the minority class, favoring the classification of the class with the greatest number of examples, the majority class. Thus, some techniques for the treatment of unbalanced data have been proposed, among them there are techniques that involve the artificial balancing of data, the modification of traditional algorithms, among other approaches. The objective of this work, therefore, is to treat unbalanced lncRNA data in order to classify them correctly mainly regarding the minority class. To this end, the most recent approaches for the treatment of unbalanced data will be used in order to identify the best treatment for the classification of molecular biology data. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)