Advanced search
Start date
Betweenand
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

A network-based positive and unlabeled learning approach for fake news detection

Full text
Author(s):
de Souza, Mariana Caravanti [1] ; Nogueira, Bruno Magalhaes [2] ; Rossi, Rafael Geraldeli [3] ; Marcacini, Ricardo Marcondes [1] ; dos Santos, Brucce Neves [1] ; Rezende, Solange Oliveira [1]
Total Authors: 6
Affiliation:
[1] ICMC USP, BR-13566590 Sao Carlos - Brazil
[2] FACOM UFMS, BR-79070900 Campo Grande, MS - Brazil
[3] CPTL UFMS, BR-79613000 Tres Lagoas - Brazil
Total Affiliations: 3
Document type: Journal article
Source: MACHINE LEARNING; NOV 2021.
Web of Science Citations: 0
Abstract

Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (k-Means, k-Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network. (AU)

FAPESP's process: 19/25010-5 - Semantically enriched representations for Portuguese textmining: models and applications
Grantee:Solange Oliveira Rezende
Support Opportunities: Regular Research Grants