Advanced search
Start date

Reprodutibilidade e reuso de experimentos em eScience: workflows, ontologias e scripts

Full text
Lucas Augusto Montalvão Costa Carvalho
Total Authors: 1
Document type: Doctoral Thesis
Press: Campinas, SP.
Institution: Universidade Estadual de Campinas (UNICAMP). Instituto de Computação
Defense date:
Examining board members:
Claudia Maria Bauzer Medeiros; Flávio Eduardo Aoki Horita; Eduardo Soares Ogasawara; Júlio César dos Reis; Leonardo Montecchi
Advisor: Yolanda Gil; Claudia Maria Bauzer Medeiros

Scripts and Scientific Workflow Management Systems (SWfMSs) are common approaches that have been used to automate the execution flow of processes and data analysis in scientific (computational) experiments. Although widely used in many disciplines, scripts are hard to understand, adapt, reuse, and reproduce. For this reason, several solutions have been proposed to aid experiment reproducibility for script-based environments. However, they neither allow to fully document the experiment nor do they help when third parties want to reuse just part of the code. SWfMSs, on the other hand, help documentation and reuse by supporting scientists in the design and execution of their experiments, which are specified and run as interconnected (reusable) workflow components (a.k.a. building blocks). While workflows are better than scripts for understandability and reuse, they still require additional documentation. During experiment design, scientists frequently create workflow variants, e.g., by changing workflow components. Reuse and reproducibility require understanding and tracking variant provenance, a time-consuming task. This thesis aims to support reproducibility and reuse of computational experiments. To meet these challenges, we address two research problems: (1) understanding a computational experiment, and (2) extending a computational experiment. Our work towards solving these problems led us to choose workflows and ontologies to answer both problems. The main contributions of this thesis are thus: (i) to present the requirements for the conversion of script to reproducible research; (ii) to propose a methodology that guides the scientists through the process of conversion of script-based experiments into reproducible workflow research objects; (iii) to design and implement features for quality assessment of computational experiments; (iv) to design and implement W2Share, a framework to support the conversion methodology, which exploits tools and standards that have been developed by the scientific community to promote reuse and reproducibility; (v) to design and implement OntoSoft-VFF, a framework for capturing information about software and workflow components to support scientists manage workflow exploration and evolution. Our work is showcased via use cases in Molecular Dynamics, Bioinformatics and Weather Forecasting (AU)

FAPESP's process: 14/23861-4 - Modeling data through different science experiments using scientific workflow and ontologies
Grantee:Lucas Augusto Montalvão Costa Carvalho
Support type: Scholarships in Brazil - Doctorate