Knowledge-enhanced document embeddings for text classification

Sinoara, Roberta A.; Camacho-Collados, Jose; Rossi, Rafael G.; Navigli, Roberto; Rezende, Solange O.

Texto completo
Autor(es):	Sinoara, Roberta A. ^[1] ; Camacho-Collados, Jose ^[2] ; Rossi, Rafael G. ^[3] ; Navigli, Roberto ^[4] ; Rezende, Solange O. ^[1] Número total de Autores: 5
Afiliação do(s) autor(es):	^[1] Univ Sao Paulo, Inst Math & Comp Sci, Lab Computat Intelligence, POB 668, BR-13561970 Sao Carlos, SP - Brazil ^[2] Cardiff Univ, Sch Comp Sci & Informat, Queens Bldg, 5 Parade, Cardiff CF243 AA, S Glam - Wales ^[3] Fed Univ Mato Grosso Do Sul Tres Lagoas Campus, Ranulpho Marques Leal 3484, POB 210, BR-79620080 Tres Lagoas, MS - Brazil ^[4] Sapienza Univ Rome, Dept Comp Sci, Via Regina Elena 295, I-00161 Rome - Italy Número total de Afiliações: 4
Tipo de documento:	Artigo Científico
Fonte:	KNOWLEDGE-BASED SYSTEMS; v. 163, p. 955-971, JAN 1 2019.
Citações Web of Science:	4
Resumo
Accurate semantic representation models are essential in text mining applications. For a successful application of the text mining process, the text representation adopted must keep the interesting patterns to be discovered. Although competitive results for automatic text classification may be achieved with traditional bag of words, such representation model cannot provide satisfactory classification performances on hard settings where richer text representations are required. In this paper, we present an approach to represent document collections based on embedded representations of words and word senses. We bring together the power of word sense disambiguation and the semantic richness of word and word-sense embedded vectors to construct embedded representations of document collections. Our approach results in semantically enhanced and low-dimensional representations. We overcome the lack of interpretability of embedded vectors, which is a drawback of this kind of representation, with the use of word sense embedded vectors. Moreover, the experimental evaluation indicates that the use of the proposed representations provides stable classifiers with strong quantitative results, especially in semantically-complex classification scenarios. (C) 2018 Elsevier B.V. All rights reserved. (AU)

Processo FAPESP:	16/17078-0 - Mineração, indexação e visualização de Big Data no contexto de sistemas de apoio à decisão clínica (MIVisBD)
Beneficiário:	Agma Juci Machado Traina
Modalidade de apoio:	Auxílio à Pesquisa - Temático


Processo FAPESP:	13/14757-6 - Incorporação da semântica na construção de websensors
Beneficiário:	Roberta Akemi Sinoara
Modalidade de apoio:	Bolsas no Brasil - Doutorado


Processo FAPESP:	16/07620-2 - Representação Semântica para Classificação de Textos
Beneficiário:	Roberta Akemi Sinoara
Modalidade de apoio:	Bolsas no Exterior - Estágio de Pesquisa - Doutorado

URL curto