Advanced search
Start date
Betweenand

Machine learning tools for bioinformatics problems

Grant number: 19/21300-9
Support type:Scholarships in Brazil - Doctorate
Effective date (Start): November 01, 2019
Effective date (End): October 31, 2020
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:André Carlos Ponce de Leon Ferreira de Carvalho
Grantee:Victor Alexandre Padilha
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated research grant:13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry, AP.CEPID

Abstract

In the recent years, machine learning techniques have been extensively used for bioinformatics, due to their capacity in solving hard problems by learning a function from a set of known examples which is able to make predictions for new and unseen data. Motivated by such results we will tackle in this project three different bioinformatics problems using machine learning techniques: (i) the classification of CRISPR associated (Cas) proteins, by extracting features from a set of sequences of different genomes. We will include the developed tool in a CRISPR system classification pipeline that we have already developed and compare it with Hidden Markov Models, which are the current technique used for labeling Cas proteins in the pipeline; (ii) we will develop a new tool for the identification of translation initiation sites from ribosome-profiling data. Based on a set of labeled data, we will extract peaks that characterize such sites and build a model to predict peaks in novel data; and (iii) we will work on the identification of long non-coding RNAs in plants, by extracting features from whole genome alignments, to make it possible the prediction of conserved protein regions with conserved secondary structure. (AU)