Advanced search
Start date
Betweenand


A Graph-Based Clustering Analysis of the QM9 Dataset via SMILES Descriptors

Full text
Author(s):
Show less -
Pinheiro, Gabriel A. ; Da Silva, Juarez L. F. ; Soares, Marinalva D. ; Quiles, Marcos G. ; Gervasi, O ; Murgante, B ; Misra, S ; Garau, C ; Blecic, I ; Taniar, D ; Apduhan, BO ; Rocha, AMAC ; Tarantino, E ; Torre, CM ; Karaca, Y
Total Authors: 15
Document type: Journal article
Source: COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT I; v. 12249, p. 13-pg., 2020-01-01.
Abstract

Machine learning has become a new hot-topic in Materials Sciences. For instance, several approaches from unsupervised and supervised learning have been applied as surrogate models to study the properties of several classes of materials. Here, we investigate, from a graph-based clustering perspective, the Quantum QM9 dataset. This dataset is one of the most used datasets in this scenario. Our investigation is two-fold: 1) understand whether the QM9 samples are organized in clusters, and 2) if the clustering structure might provide us with some insights regarding anomalous molecules, or molecules that jeopardize the accuracy of supervised property prediction methods. Our results show that the QM9 is indeed structured into clusters. These clusters, for instance, might suggest better approaches for splitting the dataset when using cross-correlation approaches in supervised learning. However, regarding our second question, our finds indicate that the clustering structure, obtained via Simplified Molecular Input Line Entry System (SMILES) representation, cannot be used to filter anomalous samples in property prediction. Thus, further investigation regarding this limitation should be conducted in future research. (AU)

FAPESP's process: 17/11631-2 - CINE: computational materials design based on atomistic simulations, meso-scale, multi-physics, and artificial intelligence for energy applications
Grantee:Juarez Lopes Ferreira da Silva
Support Opportunities: Research Grants - Research Centers in Engineering Program
FAPESP's process: 16/23642-6 - Characterization of Time-Varying Complex Networks
Grantee:Alessandra Marli Maria Morais Gouvêa
Support Opportunities: Scholarships in Brazil - Doctorate
FAPESP's process: 18/21401-7 - Multi-User Equipment approved in grant 2017/11631-2: cluster computational de alto desempenho - ENIAC
Grantee:Juarez Lopes Ferreira da Silva
Support Opportunities: Multi-user Equipment Program