Machine learning para análises preditivas em saúde: exemplo de aplicação para predizer óbito em idosos de São Paulo, Brasil

Hellen Geremias dos Santos; Carla Ferreira do Nascimento; Rafael Izbicki; Yeda Aparecida de Oliveira Duarte; Alexandre Dias Porto Chiavegatto Filho

Full text
Author(s):	Hellen Geremias dos Santos ^[1] ; Carla Ferreira do Nascimento ^[2] ; Rafael Izbicki ^[3] ; Yeda Aparecida de Oliveira Duarte ^[4] ; Alexandre Dias Porto Chiavegatto Filho ^[5] Total Authors: 5
Affiliation:	^[1] Universidade de São Paulo. Faculdade de Saúde Pública - Brasil ^[2] Universidade de São Paulo. Faculdade de Saúde Pública - Brasil ^[3] Universidade Federal de São Carlos. Centro de Ciências Exatas e de Tecnologia - Brasil ^[4] Universidade de São Paulo. Escola de Enfermagem - Brasil ^[5] Universidade de São Paulo. Faculdade de Saúde Pública - Brasil Total Affiliations: 5
Document type:	Journal article
Source:	Cadernos de Saúde Pública; v. 35, n. 7 2019-07-29.
Abstract
This study aims to present the stages related to the use of machine learning algorithms for predictive analyses in health. An application was performed in a database of elderly residents in the city of São Paulo, Brazil, who participated in the Health, Well-Being, and Aging Study (SABE) (n = 2,808). The outcome variable was the occurrence of death within five years of the elder’s entry into the study (n = 423), and the predictors were 37 variables related to the elder’s demographic, socioeconomic, and health profile. The application was organized according to the following stages: division of data in training (70%) and testing (30%), pre-processing of the predictors, learning, and assessment of the models. The learning stage used 5 algorithms to adjust the models: logistic regression with and without penalization, neural networks, gradient boosted trees, and random forest. The algorithms’ hyperparameters were optimized by 10-fold cross-validation to select those corresponding to the best models. For each algorithm, the best model was assessed in test data via area under the ROC curve (AUC) and related measures. All the models presented AUC ROC greater than 0.70. For the three models with the highest AUC ROC (neural networks and logistic regression with LASSO penalization and without penalization, respectively), quality measures of the predicted probability were also assessed. The expectation is that with the increased availability of data and trained human capital, it will be possible to develop predictive machine learning models with the potential to help health professionals make the best decisions. (AU)

FAPESP's process:	17/09369-8 - Cause-specific mortality prediction with machine learning on a longitudinal sample of 502,632 individuals
Grantee:	Alexandre Dias Porto Chiavegatto Filho
Support Opportunities:	Regular Research Grants

Short URL