Advanced search
Start date

Analysis and classification of human microbiomes: detection of bioindicators and optimization through machine learning

Grant number: 19/03396-9
Support type:Scholarships in Brazil - Doctorate (Direct)
Effective date (Start): June 01, 2019
Effective date (End): March 31, 2023
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:André Carlos Ponce de Leon Ferreira de Carvalho
Grantee:Jonas Coelho Kasmanas
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated research grant:13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry, AP.CEPID


The microbial community that inhabits various regions of the human body, whether skin, gut, esophagus, mouth, vagina and others, is called the human microbiome. These communities participate in essential processes to human health - such as digestion, nutrient absorption, detoxification, protection against pathogens, and regulation of the immune system. Thus, changes in the microbiome can cause various diseases. Therefore, the analysis of the human microbiome can help the early diagnosis of several disorders. Among the diseases that have already been correlated with the human microbiome, we can mention infections, colorectal and Esophageal Cancer, Cirrhosis and even Autism and Depression. For this reason, microbiome studies have developed widely, generating large amounts of information about these communities, and, thus, creating the Human Microbiome Big Data (HM Big Data). At the same time, Machine Learning (ML) - a data analysis strategy in which algorithms receive a set of treated data and are able to extract models, patterns and knowledge of these data - has been shown to be an efficient way of dealing with large amounts of information, such as HM Big Data. Using several of these approaches, this project will collect and select metagenomes from human microbiomes available in public databases. From these metagenomes, I will reconstruct Metagenome Assembled Genomes (MAGs) using several Bioinformatics pipelines. Then, this data will be used to create machine learning models capable of differentiating health and disease. Finally, these models must be optimized through Automatic Machine Learning (AutoML) techniques and analyzed with the intention of discovering new bioindicators of human diseases. (AU)