Word sense disambiguation: an evaluation study of semi-supervised approaches with word embeddings

Sousa, Samuel; Milios, Evangelos; Berton, Lilian; IEEE

Full text
Author(s):	Sousa, Samuel ; Milios, Evangelos ; Berton, Lilian ; IEEE Total Authors: 4
Document type:	Journal article
Source:	2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN); v. N/A, p. 8-pg., 2020-01-01.
Abstract
Word Sense Disambiguation (WSD) is a well-known problem in the field of Natural Language Processing (NLP) related to automatically determining the most appropriate sense of words in context. Several machine learning-based approaches have been proposed to tackle the ambiguity of language, but the lack of labeled data to train supervised models made semi-supervised learning (SSL) appear as an attractive option. Furthermore, the use of word embeddings to enhance the results of NLP tasks was shown to be an efficient strategy. Thus, this paper aims at adapting semi-supervised algorithms for WSD using word embeddings from Word2Vec, FastText, and BERT models combined with part-of-speech tags as input. We conduct a systematic evaluation of four graph-based SSL models analyzing the influence of their hyperparameters on the results, as well as the distances to build the graphs, the percentages of labeled data, and the word embeddings architectural variations. As a result, we show that SSL algorithms which received 10% of labeled data are strong baselines on the subsets of nouns and adjectives. Additionally, these algorithms do not need further training to disambiguate new words, hence being competitive to supervised systems. (AU)

FAPESP's process:	18/01722-3 - Semi-supervised learning via complex networks: network construction, selection and propagation of labels and applications
Grantee:	Lilian Berton
Support Opportunities:	Regular Research Grants


FAPESP's process:	18/09465-0 - Semi-supervised graph-based algorithms for word sense Disambiguation
Grantee:	Samuel Bruno da Silva Sousa
Support Opportunities:	Scholarships in Brazil - Master

Short URL