| Grant number: | 25/03525-4 |
| Support Opportunities: | Scholarships in Brazil - Scientific Initiation |
| Start date: | June 01, 2025 |
| End date: | December 31, 2025 |
| Field of knowledge: | Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques |
| Principal Investigator: | Ricardo Cerri |
| Grantee: | Maria Victória Brandão Barros |
| Host Institution: | Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil |
Abstract Transposable elements (TEs) are DNA sequences capable of moving within the genome, influencing genetic evolution. piRNAs are a class of small non-coding RNAs that play a crucial role in silencing TEs, contributing to species' reproductive stability. Predicting interactions between piRNAs and TEs through laboratory experiments is time-consuming and costly, making the development of efficient computational methods necessary.The challenges of this problem include its multi-label nature (a piRNA can interact with multiple TEs and vice versa), high data sparsity (few known positive interactions), and computational complexity (handling large volumes of biological data).This project aims to develop and evaluate machine learning models using embedding methods to optimize the prediction of interactions between piRNAs and TEs. The specific objectives include: developing and evaluating embedding-based methods in the label space to handle high sparsity; implementing and comparing machine learning models for interaction prediction; and establishing a comparative analysis using multiple evaluation metrics.The data used comes from the study "Identification of piRNA Binding Sites Reveals the Argonaute Regulatory Landscape of the C. elegans Germline", containing 19,092 in vivo recorded interactions. To structure this information, the data is organized into a binary matrix where rows represent piRNAs and columns represent TEs. Known interactions are represented by 1, while unknown ones are 0, resulting in a highly sparse matrix.Prediction will be performed using supervised learning, exploring a local approach that applies embeddings separately in the piRNA and TE spaces, reducing dimensionality to address high sparsity.Given the severe imbalance between known and unknown interactions, traditional metrics such as accuracy are inadequate. Performance will be evaluated using AUROC (Area Under the ROC Curve) and AUPRC (Area Under the Precision-Recall Curve).This study has the potential to significantly contribute to bioinformatics by proposing an efficient computational method for a complex biological problem. | |
| News published in Agência FAPESP Newsletter about the scholarship: | |
| More itemsLess items | |
| TITULO | |
| Articles published in other media outlets ( ): | |
| More itemsLess items | |
| VEICULO: TITULO (DATA) | |
| VEICULO: TITULO (DATA) | |