Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts

Texto completo
Autor(es):
Rossi, Rafael Geraldeli [1] ; Lopes, Alneu de Andrade [1] ; Rezende, Solange Oliveira [1]
Número total de Autores: 3
Afiliação do(s) autor(es):
[1] Univ Sao Paulo, Inst Math & Comp Sci, BR-05508 Sao Paulo - Brazil
Número total de Afiliações: 1
Tipo de documento: Artigo Científico
Fonte: INFORMATION PROCESSING & MANAGEMENT; v. 52, n. 2, p. 217-257, MAR 2016.
Citações Web of Science: 16
Resumo

Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independence assumption among instances or terms and the drawbacks of these algorithms. Network-based algorithms come up to avoid the drawbacks of the algorithms based on vector space model and to improve transductive classification. Networks are mostly used for label propagation, in which some labeled objects propagate their labels to other objects through the network connections. Bipartite networks are useful to represent text collections as networks and perform label propagation. The generation of this type of network avoids requirements such as collections with hyperlinks or citations, computation of similarities among all texts in the collection, as well as the setup of a number of parameters. In a bipartite heterogeneous network, objects correspond to documents and terms, and the connections are given by the occurrences of terms in documents. The label propagation is performed from documents to terms and then from terms to documents iteratively. Nevertheless, instead of using terms just as means of label propagation, in this article we propose the use of the bipartite network structure to define the relevance scores of terms for classes through an optimization process and then propagate these relevance scores to define labels for unlabeled documents. The new document labels are used to redefine the relevance scores of terms which consequently redefine the labels of unlabeled documents in an iterative process. We demonstrated that the proposed approach surpasses the algorithms for transductive classification based on vector space model or networks. Moreover, we demonstrated that the proposed algorithm effectively makes use of unlabeled documents to improve classification and it is faster than other transductive algorithms. (C) 2015 Elsevier Ltd. All rights reserved. (AU)

Processo FAPESP: 11/12823-6 - Extraindo padrões de coleções de documentos textuais utilizando redes heterogêneas
Beneficiário:Rafael Geraldeli Rossi
Modalidade de apoio: Bolsas no Brasil - Doutorado
Processo FAPESP: 11/22749-8 - Desafios em visualização exploratória de dados multidimensionais: novos paradigmas, escalabilidade e aplicações
Beneficiário:Luis Gustavo Nonato
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 14/08996-0 - Aprendizado de máquina para WebSensors: algoritmos e aplicações
Beneficiário:Solange Oliveira Rezende
Modalidade de apoio: Auxílio à Pesquisa - Regular