Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Machine Learning Prediction of Nine Molecular Properties Based on the SMILES Representation of the QM9 Quantum-Chemistry Dataset

Texto completo
Autor(es):
Pinheiro, Gabriel A. [1] ; Mucelini, Johnatan [2] ; Soares, Marinalva D. [3] ; Prati, Ronaldo C. [4] ; Da Silva, Juarez L. F. [2] ; Quiles, Marcos G. [5]
Número total de Autores: 6
Afiliação do(s) autor(es):
[1] Natl Inst Space Res, Associate Lab Comp & Appl Math, BR-12227010 Sao Jose Dos Campos, SP - Brazil
[2] Univ Sao Paulo, Sao Carlos Inst Chem, BR-13560970 Sao Carlos, SP - Brazil
[3] Fed Univ Sao Paulo UNIFESP, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP - Brazil
[4] Fed Univ ABC, Ctr Math Computat & Cognit, BR-09210580 Santo Andre, SP - Brazil
[5] Univ Fed Sao Paulo, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP - Brazil
Número total de Afiliações: 5
Tipo de documento: Artigo Científico
Fonte: Journal of Physical Chemistry A; v. 124, n. 47, p. 9854-9866, NOV 25 2020.
Citações Web of Science: 1
Resumo

Machine learning (ML) models can potentially accelerate the discovery of tailored materials by learning a function that maps chemical compounds into their respective target properties. In this realm, a crucial step is encoding the molecular systems into the ML model, in which the molecular representation plays a crucial role. Most of the representations are based on the use of atomic coordinates (structure); however, it can increase ML training and predictions' computational cost. Herein, we investigate the impact of choosing free-coordinate descriptors based on the Simplified Molecular Input Line Entry System (SMILES representation, which can substantially reduce the ML predictions' 6 computational cost. Therefore, we evaluate a feed-forward neural network (FNN) model's prediction performance over five feature selection methods and nine ground-state properties (including energetic, electronic, and thermodynamic properties) from a public data set composed of similar to 130k organic molecules. Our best results reached a mean absolute error, close to chemical accuracy, of similar to 0.05 eV for the atomization energies (internal energy at 0 K, internal energy at 298.15 K, enthalpy at 298.15 K, and free energy at 298.15 K). Moreover, for the atomization energies, the results obtained an out-of-sample error nine times less than the same FNN model trained with the Coulomb matrix, a traditional coordinate-based descriptor. Furthermore, our results showed how limited the model's accuracy is by employing such low computational cost representation that carries less information about the molecular structure than the most state-of-the-art methods. (AU)

Processo FAPESP: 17/11631-2 - CINE: desenvolvimento computacional de materiais utilizando simulações atomísticas, meso-escala, multi-física e inteligência artificial para aplicações energéticas
Beneficiário:Juarez Lopes Ferreira da Silva
Modalidade de apoio: Auxílio à Pesquisa - Programa Centros de Pesquisa em Engenharia