Advanced search
Start date
Betweenand

Embedding Methods for Predicting Multilabel Interactions between piRNAs and Transposable Elements

Grant number: 25/03525-4
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Start date: June 01, 2025
End date: December 31, 2025
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Ricardo Cerri
Grantee:Maria Victória Brandão Barros
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Abstract

Transposable elements (TEs) are DNA sequences capable of moving within the genome, influencing genetic evolution. piRNAs are a class of small non-coding RNAs that play a crucial role in silencing TEs, contributing to species' reproductive stability. Predicting interactions between piRNAs and TEs through laboratory experiments is time-consuming and costly, making the development of efficient computational methods necessary.The challenges of this problem include its multi-label nature (a piRNA can interact with multiple TEs and vice versa), high data sparsity (few known positive interactions), and computational complexity (handling large volumes of biological data).This project aims to develop and evaluate machine learning models using embedding methods to optimize the prediction of interactions between piRNAs and TEs. The specific objectives include: developing and evaluating embedding-based methods in the label space to handle high sparsity; implementing and comparing machine learning models for interaction prediction; and establishing a comparative analysis using multiple evaluation metrics.The data used comes from the study "Identification of piRNA Binding Sites Reveals the Argonaute Regulatory Landscape of the C. elegans Germline", containing 19,092 in vivo recorded interactions. To structure this information, the data is organized into a binary matrix where rows represent piRNAs and columns represent TEs. Known interactions are represented by 1, while unknown ones are 0, resulting in a highly sparse matrix.Prediction will be performed using supervised learning, exploring a local approach that applies embeddings separately in the piRNA and TE spaces, reducing dimensionality to address high sparsity.Given the severe imbalance between known and unknown interactions, traditional metrics such as accuracy are inadequate. Performance will be evaluated using AUROC (Area Under the ROC Curve) and AUPRC (Area Under the Precision-Recall Curve).This study has the potential to significantly contribute to bioinformatics by proposing an efficient computational method for a complex biological problem.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)