Automatic detection of fake tweets about the COVID-19 Vaccine in Portuguese

Geurgas, Rafael; Tessler, Leandro R.

Texto completo
Autor(es):	Geurgas, Rafael ; Tessler, Leandro R. Número total de Autores: 2
Tipo de documento:	Artigo Científico
Fonte:	SOCIAL NETWORK ANALYSIS AND MINING; v. 14, n. 1, p. 10-pg., 2024-03-08.
Resumo
The COVID-19 pandemic induced an unprecedented wave of disinformation in social media in Brazil. In particular, Twitter (currently X) was used to spread fake news about COVID-19 vaccines that helped to induce vaccine hesitation. This article presents a BERT-based neural network for the automatic detection of fake tweets. The optimized architecture relies upon BERTimbau, a BERT implementation pre-trained in Brazilian Portuguese, fine-tuned using three fully connected layers. All 2,857,908 tweets in Portuguese containing the word vacina (vaccine in Portuguese) were collected over 7 months. A random subset of 16,731 tweets was manually classified as real or fake. Of these, 2309 were discarded for not being about non-COVID-19 vaccines and 422 were discarded for containing irony. Of the remaining 14,000 tweets, 1144 were labeled fake and 12,856 were real. To balance the training dataset, the network was fine-tuned using the 1144 curated fake tweets and a random sample of 2000 real tweets. Optimal results were achieved by melting the last four layers of the BERTimbau. The best results obtained were 77.1% F1-score and 76.9% accuracy. These results are already acceptable for practical applications. They can be improved by increasing the size of the training dataset. A weighted 96.3% F1-score was obtained by training the same neural network architecture and hyperparameters with a larger curated balanced English language training dataset. (AU)

Processo FAPESP:	20/09838-0 - BI0S - Brazilian Institute of Data Science
Beneficiário:	João Marcos Travassos Romano
Modalidade de apoio:	Auxílio à Pesquisa - Programa Centros de Pesquisa em Engenharia

URL curto