Busca avançada
Ano de início
Entree


Guided Clustering for Selecting Representatives Samples in Chemical Databases

Texto completo
Autor(es):
Calderan, Felipe, V ; de Mendonca, Joao Paulo A. ; Da Silva, Juarez L. F. ; Quiles, Marcos G.
Número total de Autores: 4
Tipo de documento: Artigo Científico
Fonte: COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2023 WORKSHOPS, PART VIII; v. 14111, p. 17-pg., 2023-01-01.
Resumo

Machine Learning (ML) methods, from unsupervised to supervised algorithms, have been applied to solve several tasks in the Materials Science domain, such as property prediction, design of new chemical compounds, and surrogate models in molecular dynamics simulations. ML methods can also play a fundamental role in screening materials by reducing the number of compounds under scrutiny. This reduction assumes that compounds similarly represented by a given descriptor might have similar properties; thus, an unsupervised ML method, such as the K-Means algorithm, can cluster the data set and deliver a set of representative samples. However, this selection depends on the molecular representation that might not directly relate to the target property. Here, we propose a framework that lets the specialist select a set of representative samples in a guided fashion. In particular, a loop between a clustering algorithm (k-means) and an optimization method (Basin-Hopping) is implemented, which allows the system to learn feature weights to form more homogeneous clusters given the target property. The framework also offers other visual and textual functionalities to support the expert. We evaluate the proposed framework in two scenarios, and the results show that the guidance enhances clustering formations, both in coarse (few and big clusters) and fine (many small clusters) analyses. (AU)

Processo FAPESP: 18/21401-7 - EMU concedido no processo 2017/11631-2: cluster computacional de alto desempenho - ENIAC
Beneficiário:Juarez Lopes Ferreira da Silva
Modalidade de apoio: Auxílio à Pesquisa - Programa Equipamentos Multiusuários
Processo FAPESP: 22/09285-7 - Exploração do espaço químico via aprendizado semissupervisionado para geração de novos materiais
Beneficiário:Marcos Gonçalves Quiles
Modalidade de apoio: Auxílio à Pesquisa - Regular
Processo FAPESP: 17/11631-2 - CINE: desenvolvimento computacional de materiais utilizando simulações atomísticas, meso-escala, multi-física e inteligência artificial para aplicações energéticas
Beneficiário:Juarez Lopes Ferreira da Silva
Modalidade de apoio: Auxílio à Pesquisa - Programa Centros de Pesquisa em Engenharia