Advanced search
Start date
Betweenand


Development of scalable systems for genomic research at highperformance computing environments

Full text
Author(s):
Wélliton de Souza
Total Authors: 1
Document type: Doctoral Thesis
Press: Campinas, SP.
Institution: Universidade Estadual de Campinas (UNICAMP). Faculdade de Ciências Médicas
Defense date:
Examining board members:
Íscia Teresinha Lopes Cendes; Andre Schwambach Vieira; Diego Fernando Troggian Veiga; Mônica Barbosa de Melo; Wilson Araújo da Silva Junior
Advisor: Íscia Teresinha Lopes Cendes
Abstract

High-throughput sequencing technologies and the growing demand for large-scale analysis of genomic data sets have created computational and reproducibility challenges. Large volumes of data require systems optimized for execution in high performance and efficient environments, while research projects expand, and new computational resources are acquired. In this context, processing protocols have become more complex as sequencing techniques have been developed for areas other than genomics, such as transcriptomics and epigenomics. These protocols are composed of dozens of tasks that must be performed in a workflow that may have ramifications and use of parallel techniques, making it difficult to publish completely reproducible research, a requirement that is increasingly present in the literature. Throughout the execution of this work, reproducible pipelines were described in Workflow Description Language and executed using the Cromwell management system. The RNNR system was developed to manage computational resources and distribute and execute processing tasks across networked computers. Other tools such as Espresso-Caller and MethSeq were developed to automate the execution of complex workflows. The computational tools built, when combined with other systems and standards developed by the community, created an ecosystem for analyzing large-scale sequencing data in reproducible and supported in different computing environments. RNNR decreased the total analysis time of large volumes of sequencing data. Automation tools have simplified the execution of analyzes with hundreds of samples. The ecosystem was used to analyze thousands of sequencing samples and empowered studies in genomics, transcriptomics and epigenomics (AU)

FAPESP's process: 16/04204-8 - Development and optimization of bioinformatics protocols and tools through high-performance computing techniques for use in processing large scale biological data
Grantee:Wélliton de Souza
Support Opportunities: Scholarships in Brazil - Doctorate