Advanced search
Start date
Betweenand

The problem of data imbalance and transparency in supervised learning models applied to Credit Scoring.

Grant number: 23/06883-3
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Start date: August 01, 2023
End date: July 31, 2024
Field of knowledge:Physical Sciences and Mathematics - Probability and Statistics - Applied Probability and Statistics
Principal Investigator:Adriano Kamimura Suzuki
Grantee:Gabriel Almeida Ferreira
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Abstract

Nowadays, supervised learning algorithms have been gaining relevance in the context of Credit Scoring. However, the databases used for Credit Scoring have few examples of defaulters, which can lead the learning models to make classification errors, classifying a defaulter as a non-defaulter and consequently causing losses to the lender. Therefore, this study aims to investigate two approaches to the problem of imbalance: artificial data balancing using the ADASYN (He et al, 2008) and EEN(Wilson, 1972) algorithms, or modifying the supervised learning models using a generalized linear model with logit link function by Lemonte and Bazán (2018) and XGBoost with focal loss function by Wang et al. (2020). Additionally, another problem of supervised learning models is the interpretability of black box models. In this regard, will be used to the SHAP (Lundberg and Lee, 2017) to explain the predictions generated by these models.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)