Advanced search
Start date
Betweenand

Combining full length transcription and bioinformatics pipelines to unravel the functional potential and improving annotation of retrocopies using a human population-scale dataset.

Grant number: 23/11391-2
Support Opportunities:Scholarships abroad - Research Internship - Doctorate (Direct)
Effective date (Start): September 30, 2024
Effective date (End): December 29, 2024
Field of knowledge:Biological Sciences - Biochemistry - Molecular Biology
Principal Investigator:Pedro Alexandre Favoretto Galante
Grantee:Rafael Luiz Vieira Mercuri
Supervisor: Melina Claussnitzer
Host Institution: Hospital Sírio-Libanês. Sociedade Beneficente de Senhoras (SBSHSL). São Paulo , SP, Brazil
Research place: Broad Institute, United States  
Associated to the scholarship:20/02413-4 - Intragenic retrocopies as a source of novel protein domains in humans, BP.DD

Abstract

Retrocopies, also known as retroposed copies or processed pseudogenes, arise from the reverse transcription and subsequent integration of messenger RNA (mRNA) molecules into the genome. While many retrocopies were historically considered non-functional pseudogenes, advancements in OMICs technologies and bioinformatics have unveiled a myriad of potentially functional retrocopies. The human genome comprises approximately 8,000 fixed retrocopies, with large-scale studies indicating that over 40% of these are actively transcribed. Understanding the impact and function of retrocopies in the human genome is pivotal to highlighting their potential roles in disease development and evolutionary pathways. Although RNA-sequencing has proven instrumental in identifying expressed retrocopies, the limited length of these reads constrains our ability to ascertain retrocopies complete (full-length) transcripts. Since most retrocopies are partial gene copies and some embed within the introns of protein-coding genes, obtaining their full-length is crucial for elucidating their functions. Only full-length transcripts will reveal both coding (with CDS) and non-coding retrocopies, as well as the chimeric transcripts formed between retrocopies and their host genes. The MAS-ISO-seq technique/method offers a promising approach to overcome the obstacles associated with identifying and quantifying high-throughput full-length RNA isoforms in both single-cell and bulk data. This new methodology involves programmably concatenating complementary DNAs (cDNAs) into specialized molecules, which are ideal for long-read sequencing. This project aims to use MAS-ISO-Seq data to accurately define the transcribed regions of retrocopies (processed pseudogenes) present in the human genome, identify those retrocopies with functional potential (retrogenes) and construct a computational pipeline to analyze the expression of retrocopies with long-reads. In the end, we believe that a deeper understanding of full-length retrocopies will reveal novel (retro)genes, offering fresh insights into genome evolution and identifying new genomic regions associated with diseases.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Please report errors in scientific publications list using this form.