Fast Bipartite Forests for Semi-supervised Interaction Prediction

Ilidio, Pedro; Alves, Andre; Cerri, Ricardo

Full text
Author(s):	Ilidio, Pedro ; Alves, Andre ; Cerri, Ricardo Total Authors: 3
Document type:	Journal article
Source:	39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024; v. N/A, p. 8-pg., 2024-01-01.
Abstract
Numerous machine learning tasks can be framed as the prediction of interactions in a bipartite network, such as relationships between proteins and drug molecules, genes and transcription factors, or microRNAs and messenger RNAs. Such tasks present unique characteristics and challenges often overlooked by existing strategies, namely the high dimensionality and sparsity of available labels or the infeasibility of validating predictions for all possible pairs of entities. In this study, we investigate machine learning approaches tailored to these settings and propose refinements in forest algorithms shown to improve complexity by a factor of log n. We systematically re-implement and compare predecessor strategies on ten bipartite datasets, shedding light on their performance across diverse interaction prediction contexts. Bipartite decision forests were shown to overcome several traditional algorithms in binary interaction prediction and to surpass cutting-edge deep learning models in regressive drug-protein affinity prediction. We also assess the impact of missing positive interactions, comparing existing and newly proposed tree-based semi-supervised approaches. Our proposed forests employing semi-supervised impurities are shown to display notable resiliency in such a scenario, an especially relevant result in the realm of interaction prediction. (AU)

FAPESP's process:	22/02981-8 - Novelty detection in multi-label data streams classification
Grantee:	Ricardo Cerri
Support Opportunities:	Research Grants - Initial Project

Short URL