Politically-oriented information inference from text

da Silva, Samuel Caetano; Paraboni, Ivandre

Texto completo
Autor(es):	da Silva, Samuel Caetano ; Paraboni, Ivandre Número total de Autores: 2
Tipo de documento:	Artigo Científico
Fonte:	JOURNAL OF UNIVERSAL COMPUTER SCIENCE; v. 29, n. 6, p. 26-pg., 2023-01-01.
Resumo
The inference of politically-oriented information from text data is a popular research topic in Natural Language Processing (NLP) at both text-and author-level. In recent years, studies of this kind have been implemented with the aid of text representations ranging from simple count-based models (e.g., bag-of-words) to sequence-based models built from transformers (e.g., BERT). Despite considerable success, however, we may still ask whether results may be improved further by combining these models with additional text representations. To shed light on this issue, the present work describes a series of experiments to compare a number of strategies for political bias and ideology inference from text data using sequence-based BERT models, syntax -and semantics-driven features, and examines which of these representations (or their combina-tions) improve overall model accuracy. Results suggest that one particular strategy -namely, the combination of BERT language models with syntactic dependencies -significantly outperforms well-known count-and sequence-based text classifiers alike. In particular, the combined model has been found to improve accuracy across all tasks under consideration, outperforming the SemEval hyperpartisan news detection top-performing system by up to 6%, and outperforming the use of BERT alone by up to 21%, making a potentially strong case for the use of heterogeneous text representations in the present tasks. (AU)

Processo FAPESP:	21/08213-0 - Análise da linguagem em redes sociais para detecção precoce de transtornos de saúde mental
Beneficiário:	Ivandre Paraboni
Modalidade de apoio:	Auxílio à Pesquisa - Regular

URL curto