Advanced search
Start date
Betweenand
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

Correlation-Based Framework for Extraction of Insights from Quantum Chemistry Databases: Applications for Nanoclusters

Full text
Author(s):
Mucelini, Johnatan [1] ; Quiles, Marcos G. [2] ; Prati, Ronaldo C. [3] ; Da Silva, Juarez L. F. [1]
Total Authors: 4
Affiliation:
[1] Univ Sao Paulo, Sao Carlos Inst Chem, BR-13560970 Sao Carlos, SP - Brazil
[2] Univ Fed Sao Paulo, Dept Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP - Brazil
[3] Fed Univ ABC, Ctr Math Computat & Cognit, BR-09210580 Santo Andre, SP - Brazil
Total Affiliations: 3
Document type: Journal article
Source: JOURNAL OF CHEMICAL INFORMATION AND MODELING; v. 61, n. 3, p. 1125-1135, MAR 22 2021.
Web of Science Citations: 0
Abstract

The amount of quantum chemistry (QC) data is increasing year by year due to the continuous increase of computational power and development of new algorithms. However, in most cases, our atom-level knowledge of molecular systems has been obtained by manual data analyses based on selected descriptors. In this work, we introduce a data mining framework to accelerate the extraction of insights from QC datasets, which starts with a featurization process that converts atomic features into molecular properties (AtoMF). Then, it employs correlation coefficients (Pearson, Spearman, and Kendall) to investigate the AtoMF features relationship with a target property. We applied our framework to investigate three nanocluster systems, namely, PtnTM55-n, CenZr15-nO30, and (CHn + mH)/TM13. We found several interesting and consistent insights using Spearman and Kendall correlation coefficients, indicating that they are suitable for our approach; however, our results indicate that the Pearson coefficient is very sensitive to outliers and should not be used. Moreover, we highlight problems that can occur during this analysis and discuss how to handle them. Finally, we make available a new Python package that implements the proposed QC data mining framework, which can be used as is or modified to include new features. (AU)

FAPESP's process: 17/11631-2 - CINE: computational materials design based on atomistic simulations, meso-scale, multi-physics, and artificial intelligence for energy applications
Grantee:Juarez Lopes Ferreira da Silva
Support Opportunities: Research Grants - Research Centers in Engineering Program
FAPESP's process: 18/21401-7 - Multi-User Equipment approved in grant 2017/11631-2: cluster computational de alto desempenho - ENIAC
Grantee:Juarez Lopes Ferreira da Silva
Support Opportunities: Multi-user Equipment Program
FAPESP's process: 18/11152-0 - Catalyst design for direct conversion of methane to methanol: an ab initio Density Functional Theory investigation
Grantee:Karla Furtado Andriani
Support Opportunities: Scholarships in Brazil - Post-Doctoral