A Framework to Generate Synthetic Multi-label Datasets

Tomas, Jimena Torres; Spolaor, Newton; Cherman, Everton Alvares; Monard, Maria Carolina

Texto completo
Autor(es):	Tomas, Jimena Torres ; Spolaor, Newton ; Cherman, Everton Alvares ; Monard, Maria Carolina Número total de Autores: 4
Tipo de documento:	Artigo Científico
Fonte:	ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE; v. 302, p. 22-pg., 2014-02-25.
Resumo
A controlled environment based on known properties of the dataset used by a learning algorithm is useful to empirically evaluate machine learning algorithms. Synthetic (artificial) datasets are used for this purpose. Although there are publicly available frameworks to generate synthetic single-label datasets, this is not the case for multi-label datasets, in which each instance is associated with a set of labels usually correlated. This work presents Mldatagen, a multi-label dataset generator framework we have implemented, which is publicly available to the community. Currently, two strategies have been implemented in Mldatagen: hypersphere and hypercube. For each label in the multi-label dataset, these strategies randomly generate a geometric shape (hypersphere or hypercube), which is populated with points (instances) randomly generated. Afterwards, each instance is labeled according to the shapes it belongs to, which defines its multi-label. Experiments with a multi-label classification algorithm in six synthetic datasets illustrate the use of Mldatagen. (AU)

Processo FAPESP:	11/12597-6 - Geração de Conjuntos de Dados Sintéticos para Aprendizado Multirrótulo
Beneficiário:	Jimena Torres Tomas
Modalidade de apoio:	Bolsas no Brasil - Iniciação Científica


Processo FAPESP:	10/15992-0 - Explorando a dependência de rótulos no aprendizado multirrótulo
Beneficiário:	Everton Alvares Cherman
Modalidade de apoio:	Bolsas no Brasil - Doutorado Direto


Processo FAPESP:	11/02393-4 - Seleção de Atributos para Aprendizado Multirrótulo
Beneficiário:	Newton Spolaôr
Modalidade de apoio:	Bolsas no Brasil - Doutorado

URL curto