Busca avançada
Ano de início
Entree


A Graph-Based Clustering Analysis of the QM9 Dataset via SMILES Descriptors

Texto completo
Autor(es):
Mostrar menos -
Pinheiro, Gabriel A. ; Da Silva, Juarez L. F. ; Soares, Marinalva D. ; Quiles, Marcos G. ; Gervasi, O ; Murgante, B ; Misra, S ; Garau, C ; Blecic, I ; Taniar, D ; Apduhan, BO ; Rocha, AMAC ; Tarantino, E ; Torre, CM ; Karaca, Y
Número total de Autores: 15
Tipo de documento: Artigo Científico
Fonte: COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT I; v. 12249, p. 13-pg., 2020-01-01.
Resumo

Machine learning has become a new hot-topic in Materials Sciences. For instance, several approaches from unsupervised and supervised learning have been applied as surrogate models to study the properties of several classes of materials. Here, we investigate, from a graph-based clustering perspective, the Quantum QM9 dataset. This dataset is one of the most used datasets in this scenario. Our investigation is two-fold: 1) understand whether the QM9 samples are organized in clusters, and 2) if the clustering structure might provide us with some insights regarding anomalous molecules, or molecules that jeopardize the accuracy of supervised property prediction methods. Our results show that the QM9 is indeed structured into clusters. These clusters, for instance, might suggest better approaches for splitting the dataset when using cross-correlation approaches in supervised learning. However, regarding our second question, our finds indicate that the clustering structure, obtained via Simplified Molecular Input Line Entry System (SMILES) representation, cannot be used to filter anomalous samples in property prediction. Thus, further investigation regarding this limitation should be conducted in future research. (AU)

Processo FAPESP: 17/11631-2 - CINE: desenvolvimento computacional de materiais utilizando simulações atomísticas, meso-escala, multi-física e inteligência artificial para aplicações energéticas
Beneficiário:Juarez Lopes Ferreira da Silva
Modalidade de apoio: Auxílio à Pesquisa - Programa Centros de Pesquisa em Engenharia
Processo FAPESP: 16/23642-6 - Caracterização de Redes Complexas Dinâmicas
Beneficiário:Alessandra Marli Maria Morais Gouvêa
Modalidade de apoio: Bolsas no Brasil - Doutorado
Processo FAPESP: 18/21401-7 - EMU concedido no processo 2017/11631-2: cluster computacional de alto desempenho - ENIAC
Beneficiário:Juarez Lopes Ferreira da Silva
Modalidade de apoio: Auxílio à Pesquisa - Programa Equipamentos Multiusuários