Busca avançada
Ano de início
Entree


A distantly supervised approach for recognizing product mentions in user-generated content

Texto completo
Autor(es):
Vieira, Henry S. ; da Silva, Altigran S. ; Calado, Pavel ; de Moura, Edleno S.
Número total de Autores: 4
Tipo de documento: Artigo Científico
Fonte: JOURNAL OF INTELLIGENT INFORMATION SYSTEMS; v. N/A, p. 24-pg., 2022-05-27.
Resumo

As online purchasing becomes more popular, users trust more information published on social media than on advertisement content. Opinion mining is often applied to social media, and opinion target extraction is one of its main sub-tasks. In this paper, we focus on recognizing target entities related to electronic products. We propose a method called ProdSpot, for training a named entity extractor to identify product mentions in user text based on the distant supervision paradigm. ProdSpot relies only on an unlabeled set of product offer titles and a list of product brand names. Initially, surface forms are identified from product titles. Given a collection of user posts, our method selects sentences that contain at least one surface form to be automatically labeled. A cluster-based filtering strategy is applied to detect and filter out possible mislabelled sentences. Finally, data augmentation is used to produce more general and diverse training. The set of augmented sentences constitutes the training set to train a recognition model. Experiments demonstrate that the training data automatically generated yields results similar to those achieved by a supervised model. Our best result for precision is only 9% lower than a supervised model, while our recall level is higher by approximately 7% in two distinct product categories. Compared to a state-of-the-art supervised method specifically designed to recognize mobile phone names, our method achieved competitive results with F1 values only 4% lower while not requiring user supervision. Our filtering and data augmentation steps directly influence these results. (AU)

Processo FAPESP: 20/05173-4 - Uma abordagem multimodal para identificar viés em mídias sociais digitais
Beneficiário:Altigran Soares da Silva
Modalidade de apoio: Auxílio à Pesquisa - Regular