Research Grants 22/09285-7 - Aprendizado computacional, Aprendizado semissupervisionado

Abstract

The discovery of new materials is directly linked to the evolution of society. These materials can allow the generation of new drugs to the development of electronic components for clean energy generation. It is noteworthy that in addition to the various materials already available in nature, a multitude of compounds can be theoretically generated from the combination of ordinary chemical elements. However, this space of possibility, called chemical space, is practically infinite, making a thorough scrutiny of all the possibilities unfeasible. To facilitate the search for new materials, scientists have used various machine learning (ML) techniques. In the direct process, ML techniques can be trained and used to predict specific properties of new materials. On the other hand, ML techniques can also be used in the inverse design process, in which the model learns to generate new compounds from desired properties. Among the various ML techniques available in the literature, generative models based on autoencoders have shown promising results. Recently, we proposed a generative model called Supervised Grammatical Variational Autoencoder (SGVAE). This model can perform the two tasks described above: property prediction and molecule design. However, this model, like others in the literature, has limitations and use restrictions, such as a) most models are intrinsically supervised; b) lack a broad study on molecular representations; c) generation of latent spaces with low navigability (sampling) and interpretation; d) lack of a methodology for continuous adaptation of the model in scenarios in which new data are constantly added to the database; and e) validation of models in real scenarios. In this sense, to answer some of these questions, new models based on Variational Autoencoders (VAE) will be studied and developed to generate materials considering multiple representations. A semi-supervised approach will be considered to train the models, in which the data are partially labeled. Moreover, active learning techniques will also be considered to enhance the usage of labeled data and the continuous exploration of the chemical space. To improve the chemical/physical interpretation of the learned latent representation, a qualitative and quantitative analysis of the VAEs will be performed. The models will be evaluated using public datasets and data generated in the context of CINE (Center for Innovation on New Energies). Finally, it is worth noting that this project is part of CINE's computational division (4), where the proponent is one of the principal researchers (Proc. 2017/11631-2). (AU)

Articles published in Agência FAPESP Newsletter about the research grant:

More items Less items

TITULO

Articles published in other media outlets ( ):

More items Less items

VEICULO: TITULO (DATA)

Scientific publications (6)

(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)

BEZERRA, RAQUEL C.; CALDERAN, FELIPE V.; FELICIO-SOUSA, PRISCILLA; PERACA, CARINA S. T.; QUILES, MARCOS G.; DA SILVA, JUAREZ L. F.. Exploring the Adsorption Properties of Small Molecules on CeZr-Based Nanoclusters. ACS OMEGA, v. 10, n. 37, p. 14-pg., 2025-09-13. (21/03357-3, 18/21401-7, 17/11631-2, 22/09285-7)

PRATI, RONALDO C.; RODRIGUES, BRUNO S. M.; ARAGAO, IBERIS; SOARES, THEREZA A.; QUILES, MARCOS G.; DA SILVA, JUAREZ L. F.. The Impact of Interdisciplinary, Gender and Geographic Distributions on the Citation Patterns of the Journal of Chemical Information and Modeling. JOURNAL OF CHEMICAL INFORMATION AND MODELING, v. 64, n. 4, p. 5-pg., 2024-02-12. (22/09285-7, 17/11631-2, 18/21401-7, 21/04283-3)

QUILES, MARCOS G.; RIBEIRO, PIERO A. L.; PINHEIRO, GABRIEL A.; PRATI, RONALDO C.; DA SILVA, JUAREZ L. F.. Enhancing Low-Cost Molecular Property Prediction with Contrastive Learning on SMILES Representations. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024 WORKSHOPS, PT IX, v. 14823, p. 15-pg., 2024-01-01. (22/09285-7, 17/11631-2)

CALDERAN, FELIPE, V; DE MENDONCA, JOAO PAULO A.; DA SILVA, JUAREZ L. F.; QUILES, MARCOS G.. Guided Clustering for Selecting Representatives Samples in Chemical Databases. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2023 WORKSHOPS, PART VIII, v. 14111, p. 17-pg., 2023-01-01. (18/21401-7, 22/09285-7, 17/11631-2)

PELIN CARDOSO, LUIS EDUARDO; DE CARVALHO, ANDRE C. P. DE LEON F.; QUILES, MARCOS G.. Applying LSTM Recurrent Neural Networks to Predict Revenue. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024, PT II, v. 14814, p. 15-pg., 2024-01-01. (20/09835-1, 22/09285-7)

BARROS DA SILVA, ARNALDO, V; SALDIVIA-SIRACUSA, CRISTINA; CARLOS DE SOUZA, EDUARDO SANTOS; DAMACENO ARAUJO, ANNA LUIZA; LOPES, MARCIO AJUDARTE; VARGAS, PABLO AGUSTIN; KOWALSKI, LUIZ PAULO; SANTOS-SILVA, ALAN ROGER; DE CARVALHO, ANDRE C. P. L. F.; QUILES, MARCOS G.. Enhancing Explainability in Oral Cancer Detection with Grad-CAM Visualizations. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024, PT I, v. 14813, p. 14-pg., 2024-01-01. (20/09835-1, 22/09285-7)

Grant number:	22/09285-7
Support Opportunities:	Regular Research Grants
Start date:	March 01, 2023
End date:	February 28, 2025
Field of knowledge:	Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques

Principal Investigator:	Marcos Gonçalves Quiles
Grantee:	Marcos Gonçalves Quiles

Host Institution:	Instituto de Ciência e Tecnologia (ICT). Universidade Federal de São Paulo (UNIFESP). Campus São José dos Campos. São José dos Campos , SP, Brazil

City of the host institution:	São José dos Campos

Associated researchers:	Juarez Lopes Ferreira da Silva ; Ronaldo Cristiano Prati

Short URL