Busca avançada
Ano de início
Entree


Generating Diverse Clustering Datasets with Targeted Characteristics

Texto completo
Autor(es):
dos Santos Fernandes, Luiz Henrique ; Smith-Miles, Kate ; Lorena, Ana Carolina ; Xavier-Junior, JC ; Rios, RA
Número total de Autores: 5
Tipo de documento: Artigo Científico
Fonte: INTELLIGENT SYSTEMS, PT I; v. 13653, p. 15-pg., 2022-01-01.
Resumo

When evaluating clustering algorithms, it is important to assess their performance in retrieving clusters of datasets with known structures. Nonetheless, generating and choosing diverse datasets to compose such test benchmarks is non-trivial. The datasets must present a large variety of structures and characteristics so that the algorithms can be challenged and their strengths and weaknesses can be revealed. The use of generators currently available in the literature relies on trial and error procedures that can be quite costly and inaccurate. Taking advantage of an Instance Space Analysis of popular clustering benchmarks, where datasets are projected into a 2-D embedding with linear trends according to different characteristics, we use a genetic algorithm to produce new datasets at targeted locations in the instance space. This is a natural extension of the Instance Space Analysis framework, and as a result, we are able to produce diverse datasets for composing test benchmarks for clustering. (AU)

Processo FAPESP: 21/06870-3 - Além da seleção de algoritmos: meta-aprendizado para análise e entendimento de dados e algoritmos
Beneficiário:Ana Carolina Lorena
Modalidade de apoio: Auxílio à Pesquisa - Jovens Pesquisadores - Fase 2