Deep analysis of word sense disambiguation via semi-supervised learning and neural word representations

Duarte, Jose Marcio; Sousa, Samuel; Milios, Evangelos; Berton, Lilian

Full text
Author(s):	Duarte, Jose Marcio ^[1] ; Sousa, Samuel ^[1] ; Milios, Evangelos ^[2] ; Berton, Lilian ^[1] Total Authors: 4
Affiliation:	^[1] Univ Fed Sao Paulo, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP - Brazil ^[2] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 1W5 - Canada Total Affiliations: 2
Document type:	Journal article
Source:	INFORMATION SCIENCES; v. 570, p. 278-297, SEP 2021.
Web of Science Citations:	0
Abstract
Word Sense Disambiguation (WSD) aims to determine the meaning of a word in context. Different approaches have been proposed in supervised and unsupervised domains. In most cases, supervised learning provides superior WSD performance. Since sense annotated corpora can be difficult or time-consuming to obtain, which must be repeated for new domains, languages, and sense inventories, semi-supervised learning (SSL) methods, that combine a small amount of sense-annotated data, start to be pre-eminent. In SSL, graph-based methods are common, because they capture the relationships between terms using an undirected graph. This paper aims to investigate semi-supervised WSD by considering different graph-based SSL algorithms with features generated by word embeddings from Word2Vec, FastText, GloVe, BERT and ELECTRA models combined with parts-of speech tags and word context. We test several combinations of word-embedding models, similarity measures for graph construction and SSL classification algorithms to disambiguate classical lexical sample WSD datasets. The results indicate our SSL algorithms achieved competitive results compared to supervised ones and the ELECTRA models performed better than other embeddings for SSL. (c) 2021 Elsevier Inc. All rights reserved. (AU)

FAPESP's process:	18/01722-3 - Semi-supervised learning via complex networks: network construction, selection and propagation of labels and applications
Grantee:	Lilian Berton
Support Opportunities:	Regular Research Grants


FAPESP's process:	18/09465-0 - Semi-supervised graph-based algorithms for word sense Disambiguation
Grantee:	Samuel Bruno da Silva Sousa
Support Opportunities:	Scholarships in Brazil - Master

Short URL