Busca avançada
Ano de início
Entree


SMICLR: Contrastive Learning on Multiple Molecular Representations for Semisupervised and Unsupervised Representation Learning

Texto completo
Autor(es):
Pinheiro, Gabriel A. ; Silva, Juarez L. F. ; Quiles, Marcos G.
Número total de Autores: 3
Tipo de documento: Artigo Científico
Fonte: JOURNAL OF CHEMICAL INFORMATION AND MODELING; v. 62, n. 17, p. 13-pg., 2022-09-12.
Resumo

Machine learning as a tool for chemical space exploration broadens horizons to work with known and unknown molecules. At its core lies molecular representation, an essential key to improve learning about structure-property relationships. Recently, contrastive frameworks have been showing impressive results for representation learning in diverse domains. Therefore, this paper proposes a contrastive framework that embraces multimodal molecular data. Specifically, our approach jointly trains a graph encoder and an encoder for the simplified molecular-input line-entry system (SMILES) string to perform the contrastive learning objective. Since SMILES is the basis of our method, i.e., we built the molecular graph from the SMILES, we call our framework as SMILES Contrastive Learning (SMICLR). When stacking a nonlinear regressor on the SMICLR's pretrained encoder and fine-tuning the entire model, we reduced the prediction error by, on average, 44% and 25% for the energetic and electronic properties of the QM9 data set, respectively, over the supervised baseline. We further improved our framework's performance when applying data augmentations in each molecular-input representation. Moreover, SMICLR demonstrated competitive representation learning results in an unsupervised setting. (AU)

Processo FAPESP: 18/21401-7 - EMU concedido no processo 2017/11631-2: cluster computacional de alto desempenho - ENIAC
Beneficiário:Juarez Lopes Ferreira da Silva
Modalidade de apoio: Auxílio à Pesquisa - Programa Equipamentos Multiusuários
Processo FAPESP: 17/11631-2 - CINE: desenvolvimento computacional de materiais utilizando simulações atomísticas, meso-escala, multi-física e inteligência artificial para aplicações energéticas
Beneficiário:Juarez Lopes Ferreira da Silva
Modalidade de apoio: Auxílio à Pesquisa - Programa Centros de Pesquisa em Engenharia
Processo FAPESP: 21/08852-2 - Predição de propriedades moleculares com alta acurácia: uma abordagem via aprendizado semi-supervisionado
Beneficiário:Gabriel Augusto Lins Leal Pinheiro
Modalidade de apoio: Bolsas no Brasil - Doutorado