Busca avançada
Ano de início
Entree


An Instance Level Analysis of Classification Difficulty for Unlabeled Data

Texto completo
Autor(es):
Ueda, Patricia S. M. ; Rivolli, Adriano ; Lorena, Ana Carolina
Número total de Autores: 3
Tipo de documento: Artigo Científico
Fonte: INTELLIGENT SYSTEMS, BRACIS 2024, PT I; v. 15412, p. 15-pg., 2025-01-01.
Resumo

Instance hardness measures allow us to assess and understand why some observations from a dataset are difficult to classify. With this information, one may curate and cleanse the training dataset for improved data quality. However, these measures require data to be labeled. This limits their usage in the deployment stage when data is unlabeled. This paper investigates whether it is possible to identify observations that will be hard to classify despite their label. For such, two approaches are tested. The first adapts known instance hardness measures to the unlabeled scenario. The second learns regression metamodels to estimate the instance hardness of new data observations. In experiments, both approaches were better at identifying instances lying in borderline regions of the dataset, which pose a greater difficulty when the label is unknown. (AU)

Processo FAPESP: 21/06870-3 - Além da seleção de algoritmos: meta-aprendizado para análise e entendimento de dados e algoritmos
Beneficiário:Ana Carolina Lorena
Modalidade de apoio: Auxílio à Pesquisa - Jovens Pesquisadores - Fase 2