Data heterogeneity consideration in semi-supervised learning

Araujo, Bilza; Zhao, Liang

Texto completo
Autor(es):	Araujo, Bilza ^{[1, 2]} ; Zhao, Liang ^[3] Número total de Autores: 2
Afiliação do(s) autor(es):	^[1] Fed Univ Southern Bahia, Inst Humanities Arts & Sci, BR-45810000 Porto Seguro, BA - Brazil ^[2] Univ Sao Paulo, Inst Math & Comp Sci, Dept Comp Sci, BR-13560970 Sao Paulo - Brazil ^[3] Univ Sao Paulo, Sch Philosophy Sci & Literature Ribeirao Preto, Dept Computat & Math, BR-14090901 Sao Paulo - Brazil Número total de Afiliações: 3
Tipo de documento:	Artigo Científico
Fonte:	EXPERT SYSTEMS WITH APPLICATIONS; v. 45, p. 234-247, MAR 1 2016.
Citações Web of Science:	4
Resumo
In class (cluster) formation process of machine learning techniques, data instances are usually assumed to have equal relevance. However, it is frequently not true. Such a situation is more typical in semi-supervised learning since we have to understand the data structure of both labeled and unlabeled data at the same time. In this paper, we investigate the organizational heterogeneity of data in semi-supervised learning using graph representation. This is because graph is a natural choice to characterize relationship between any pair of nodes or any pair of groups of nodes, consequently, strategical location of each node or each group of nodes can be determined by graph measures. Specifically, two issues are addressed: (1) We propose an adaptive graph construction method, we call AdaRadius, considering the heterogeneity of local interacting structure among nodes. As a result, it presents several interesting properties, namely adaptability to data density variations, low dependency on parameters setting, and reasonable computational cost, for both pool based and incremental data. (2) Moreover, we present heuristic criteria for selecting representative data samples to be labeled. Experimental study shows that selective labeling usually gets better classification results than random labeling. To our knowledge, it still lacks investigation on both issues up to now, therefore, our approach presents an important step toward the data heterogeneity characterization not only in semi-supervised learning, but also in general machine learning. (C) 2015 Elsevier Ltd. All rights reserved. (AU)

Processo FAPESP:	13/07375-0 - CeMEAI - Centro de Ciências Matemáticas Aplicadas à Indústria
Beneficiário:	Francisco Louzada Neto
Modalidade de apoio:	Auxílio à Pesquisa - Centros de Pesquisa, Inovação e Difusão - CEPIDs


Processo FAPESP:	11/50151-0 - Fenômenos dinâmicos em redes complexas: fundamentos e aplicações
Beneficiário:	Elbert Einstein Nehrer Macau
Modalidade de apoio:	Auxílio à Pesquisa - Temático

URL curto