Advanced search
Start date
Betweenand


Origins andfunctional potential of retrocopies in the genome of humans and other animals: a large-scale approach to identify retrocopies in diverse animal genomes and study their expression pattern in normal human tissues.

Full text
Author(s):
Helena Beatriz da Conceição
Total Authors: 1
Document type: Doctoral Thesis
Press: São Paulo.
Institution: Universidade de São Paulo (USP). Instituto de Matemática e Estatística (IME/SBI)
Defense date:
Examining board members:
Pedro Alexandre Favoretto Galante; Katlin Brauer Massirer; Alexandre Rossi Paschoal; Tatiana Teixeira Torres
Advisor: Pedro Alexandre Favoretto Galante
Abstract

Processed pseudogenes, also known as retrocopies, are copies of coding genes originating through the RNA-mediated duplication mechanism. They are characterized by the conservation of only the exons of their parental genes, the absence of introns, frequent presence of integrated poly-A tails into the genome, and the absence of parental promoter regions. These characteristics have been used to identify retrocopies since the 1980s when many human retroduplicated genes were first reported. However, the systematic search for retrocopies became possible only after the complete sequencing and assembly of the Human Genome, enabling the development of more advanced sequencing technologies, improvements in transcriptome annotation, and the emergence of new computational tools. In the early 2000s, the first comprehensive analyses to identify retrocopies in the human genome and other species were conducted, followed by additional studies that continue to this day. However, the literature on the identification and functional analysis of retrocopies still lacks in-depth analyses and more comprehensive approaches. In this thesis, we present a systematic and comprehensive investigation for the identification of retrocopies in 44 species, ranging from humans to invertebrates. First, we constructed a pipeline to identify, characterize, organize, and make information about the 219,948 retrocopies of these 44 organisms available via the web. All information, including genomic position, parental genes, retrocopy size, expression, conservation between species, among others, was organized in a public database, RCPedia 2.0. In a complementary study, we investigated the impact and functional potential of transcribed retrocopies in the human genome. For this, we conducted complex analyses combining RNA sequencing data from multiple tissues, epigenetic data, and Ribosome Sequencing to first elucidate retrocopy expression and how they may be regulated, and then evaluate their functionalities. We found that approximately 50% (around 4,000) of retrocopies in the human genome are expressed and have their expression levels regulated in healthy tissues. About 25% of these retrocopies are expressed in only one tissue (mainly in the testes), while approximately 15% of them are expressed in all investigated human tissues. Our data indicate that the driving force for the expression of these retrocopies is their genomic proximity to protein-coding genes or the (older) age of origin of these retrocopies. We further confirmed that a subset of retrocopies is translated, emphasizing their functional potential. Therefore, in this thesis, we highlight a frequently overlooked segment in the human and other species\' transcriptome: retrocopies (or processed pseudogenes). We not only reveal their considerable functional potential and their ability to generate genetic innovations through the mechanism of retrotransposition of coding genes but also lay the groundwork for a comprehensive and specific exploration of each of these numerous retrocopies. (AU)

FAPESP's process: 18/13613-4 - Study of the contribution of retrocopies in the origin of new genetic regions
Grantee:Helena Beatriz da Conceição
Support Opportunities: Scholarships in Brazil - Doctorate (Direct)