Advanced search
Start date
Betweenand


Guided Clustering for Selecting Representatives Samples in Chemical Databases

Full text
Author(s):
Calderan, Felipe, V ; de Mendonca, Joao Paulo A. ; Da Silva, Juarez L. F. ; Quiles, Marcos G.
Total Authors: 4
Document type: Journal article
Source: COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2023 WORKSHOPS, PART VIII; v. 14111, p. 17-pg., 2023-01-01.
Abstract

Machine Learning (ML) methods, from unsupervised to supervised algorithms, have been applied to solve several tasks in the Materials Science domain, such as property prediction, design of new chemical compounds, and surrogate models in molecular dynamics simulations. ML methods can also play a fundamental role in screening materials by reducing the number of compounds under scrutiny. This reduction assumes that compounds similarly represented by a given descriptor might have similar properties; thus, an unsupervised ML method, such as the K-Means algorithm, can cluster the data set and deliver a set of representative samples. However, this selection depends on the molecular representation that might not directly relate to the target property. Here, we propose a framework that lets the specialist select a set of representative samples in a guided fashion. In particular, a loop between a clustering algorithm (k-means) and an optimization method (Basin-Hopping) is implemented, which allows the system to learn feature weights to form more homogeneous clusters given the target property. The framework also offers other visual and textual functionalities to support the expert. We evaluate the proposed framework in two scenarios, and the results show that the guidance enhances clustering formations, both in coarse (few and big clusters) and fine (many small clusters) analyses. (AU)

FAPESP's process: 18/21401-7 - Multi-User Equipment approved in grant 2017/11631-2: cluster computational de alto desempenho - ENIAC
Grantee:Juarez Lopes Ferreira da Silva
Support Opportunities: Multi-user Equipment Program
FAPESP's process: 22/09285-7 - Chemical space exploration via semi-supervised learning for design of new materials
Grantee:Marcos Gonçalves Quiles
Support Opportunities: Regular Research Grants
FAPESP's process: 17/11631-2 - CINE: computational materials design based on atomistic simulations, meso-scale, multi-physics, and artificial intelligence for energy applications
Grantee:Juarez Lopes Ferreira da Silva
Support Opportunities: Research Grants - Research Centers in Engineering Program