Word sense induction using word embeddings and community detection in complex networks

Correa, Jr., Edilson A.; Amancio, Diego R.

Texto completo
Autor(es):	Correa, Jr., Edilson A. ^[1] ; Amancio, Diego R. ^{[1, 2]} Número total de Autores: 2
Afiliação do(s) autor(es):	^[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP - Brazil ^[2] Indiana Univ, Sch Informat Comp & Engn, Bloomington, IN 47408 - USA Número total de Afiliações: 2
Tipo de documento:	Artigo Científico
Fonte:	PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS; v. 523, p. 180-190, JUN 1 2019.
Citações Web of Science:	0
Resumo
Word Sense Induction (WSI) is the ability to automatically induce word senses from corpora. The WSI task was first proposed to overcome the limitations of manually annotated corpus that are required in word sense disambiguation systems. Even though several works have been proposed to induce word senses, existing systems are still very limited in the sense that they make use of structured, domain-specific knowledge sources. In this paper, we devise a method that leverages recent findings in word embeddings research to generate context embeddings, which are embeddings containing information about the semantical context of a word. In order to induce senses, we modeled the set of ambiguous words as a complex network. In the generated network, two instances (nodes) are connected if the respective context embeddings are similar. Upon using well-established community detection methods to cluster the obtained context embeddings, we found that the proposed method yields excellent performance for the WSI task. Our method outperformed competing algorithms and baselines, in a completely unsupervised manner and without the need of any additional structured knowledge source. (C) 2019 Elsevier B.V. All rights reserved. (AU)

Processo FAPESP:	14/20830-0 - Modelagem e reconhecimento de padrões em textos com redes complexas
Beneficiário:	Diego Raphael Amancio
Modalidade de apoio:	Auxílio à Pesquisa - Regular


Processo FAPESP:	17/13464-6 - Modelando grafos de citação e informação: uma abordagem baseada em redes complexas
Beneficiário:	Diego Raphael Amancio
Modalidade de apoio:	Bolsas no Exterior - Pesquisa


Processo FAPESP:	16/19069-9 - Classificação de documentos usando informações semânticas em redes complexas
Beneficiário:	Diego Raphael Amancio
Modalidade de apoio:	Auxílio à Pesquisa - Regular

URL curto