Scholarship 24/10958-1 - Anotação, Aprendizado computacional - BV FAPESP
Advanced search
Start date
Betweenand

Decoding the language of life: Towards protein classification and design through large language models

Grant number: 24/10958-1
Support Opportunities:Scholarships in Brazil - Doctorate (Direct)
Start date: November 01, 2024
End date: October 31, 2029
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:André Carlos Ponce de Leon Ferreira de Carvalho
Grantee:Breno Livio Silva de Almeida
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Company:Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC)
Associated research grant:20/09835-1 - IARA - Artificial Intelligence in the Remaking of Urban Environments, AP.PCPE

Abstract

Understanding the language of life, comprised of DNA and its translation into amino acids, presents a formidable challenge for traditional methodologies due to its inherent complexity. Consequently, there has been a notable shift towards leveraging the capabilities of alignment-free techniques, particularly machine learning (ML) approaches. Among these, deep learning, specifically large language models (LLMs), has emerged as a pivotal tool in deciphering the language of genomics and proteomics. These models excel in classifying sequences and generating new ones without complex feature engineering and offer versatility and effectiveness in tasks such as DNA sequence classification and protein structure prediction. With their sophisticated architecture and exposure to vast amounts of data, LLMs may exhibit emergent abilities, such as protein design, extending their utility beyond conventional linguistic tasks to revolutionize fields like biotechnology and drug development. Furthermore, integrating automated machine learning (AutoML), renowned for streamlining ML tasks, with LLMs can democratize ML access and enhance their capabilities. By leveraging AutoML alongside LLMs, we can mitigate technical barriers and harness their combined potential for recommendations, thus fostering a more inclusive and efficient ML ecosystem. Our research advocates integrating AutoML with LLMs while delving into the conditions conducive to an emergent property like protein design. Such insights promise to expedite the development of sophisticated artificial intelligence models tailored explicitly for protein engineering, thereby pushing the boundaries of biological innovation.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)