Advanced search
Start date

Machine learning tools for bioinformatics problems

Grant number: 19/21300-9
Support type:Scholarships in Brazil - Doctorate
Effective date (Start): November 01, 2019
Effective date (End): September 30, 2020
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal researcher:André Carlos Ponce de Leon Ferreira de Carvalho
Grantee:Victor Alexandre Padilha
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated research grant:13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry, AP.CEPID


In the recent years, machine learning techniques have been extensively used for bioinformatics, due to their capacity in solving hard problems by learning a function from a set of known examples which is able to make predictions for new and unseen data. Motivated by such results we will tackle in this project three different bioinformatics problems using machine learning techniques: (i) the classification of CRISPR associated (Cas) proteins, by extracting features from a set of sequences of different genomes. We will include the developed tool in a CRISPR system classification pipeline that we have already developed and compare it with Hidden Markov Models, which are the current technique used for labeling Cas proteins in the pipeline; (ii) we will develop a new tool for the identification of translation initiation sites from ribosome-profiling data. Based on a set of labeled data, we will extract peaks that characterize such sites and build a model to predict peaks in novel data; and (iii) we will work on the identification of long non-coding RNAs in plants, by extracting features from whole genome alignments, to make it possible the prediction of conserved protein regions with conserved secondary structure. (AU)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
PADILHA, VICTOR A.; ALKHNBASHI, OMER S.; SHAH, SHIRAZ A.; DE CARVALHO, ANDRE C. P. L. F.; BACKOFEN, ROLF. CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems. GIGASCIENCE, v. 9, n. 6 JUN 2020. Web of Science Citations: 0.

Please report errors in scientific publications list by writing to: