A preprocessing Shapley value-based approach to detect relevant and disparity prone features in machine learning

Pelegrina, Guilherme Dean; Couceiro, Miguel; Duarte, Leonardo Tomazeli

Texto completo
Autor(es):	Pelegrina, Guilherme Dean ; Couceiro, Miguel ; Duarte, Leonardo Tomazeli Número total de Autores: 3
Tipo de documento:	Artigo Científico
Fonte:	PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024; v. N/A, p. 11-pg., 2024-01-01.
Resumo
Decision support systems became ubiquitous in every aspect of human lives. Their reliance on increasingly complex and opaque machine learning models raises transparency and fairness concerns with respect to unprivileged groups of people. This motivated several efforts to estimate importance of features towards the models' performance and to detect unfair/disparate decisions. The latter is often dealt with by means of fairness metrics that rely on performance metrics with respect to predefined features that are considered protected (salient features such as age, gender, ethnicity, etc.) and/or sensitive (such as education, /occupation, banking information). However, such an approach is subjective (as fairness metrics depend on the choice features), there may be other features that lead to unfair (disparate) decisions and that may ask for suitable interpretations. In this paperwe focus on the latter issues and propose a statistical preprocessing approach that is inspired by both the Hilbert-Schmidt independence criterion and Shapley values to estimate feature importance and to detect disparity prone features. Unlike traditional Shapley value-based approaches, we do not require trained models to measure feature importance or detect disparate results. Instead, it focuses on data and statistical criteria to measure the dependence of feature distributions. Our empirical results show that features with the highest dependence degrees with the label vector are also the ones with the highest impact on the model performance. Moreover, our empirical results indicate that this relation enables the detection of disparity prone features. (AU)

Processo FAPESP:	21/11086-0 - Interpretabilidade e equidade em aprendizado de máquina: funções baseadas na capacidade e índices de interação
Beneficiário:	Guilherme Dean Pelegrina
Modalidade de apoio:	Bolsas no Exterior - Estágio de Pesquisa - Pós-Doutorado


Processo FAPESP:	20/10572-5 - Novas abordagens para lidar com imparcialidade e transparência em problemas de aprendizado de máquina
Beneficiário:	Guilherme Dean Pelegrina
Modalidade de apoio:	Bolsas no Brasil - Pós-Doutorado


Processo FAPESP:	20/09838-0 - BI0S - Brazilian Institute of Data Science
Beneficiário:	João Marcos Travassos Romano
Modalidade de apoio:	Auxílio à Pesquisa - Programa Centros de Pesquisa em Engenharia

URL curto