Advanced search
Start date
Betweenand

Improving Reproducibility of Results from Genetic Evaluation Analisys by Development of Data Formatting Functions for R Software

Abstract

"Recently, the concept of reproducibility of scientific results has gained importance, and practices to ensure it are being implemented in the most important scientific journals, which now require the deposit, along with the article, of the data used and, in a few cases, the scripts and routines used in the analysis procedures. Specifically in the data formatting stage for genetic evaluation software, automation is possible and will contribute to minimizing errors in data preparation and ensuring standardization of this stage, increasing the chances of reproducibility of final results by other research groups. Formatting data according to the required standard by the software may, in some cases, have many rules and require a lot of time and attention to be performed. Errors at this stage can result in the existence of multiple files for the same set of data being worked on by different teams. In this scenario, reproducibility of results is compromised, as the results should be different for the same data and analyses, not allowing the validation of results. To enable researchers to invest most of their time in interpreting results, minimize the inconveniences of file formatting, and ensure consistent and quality results, the development of functions for converting data files from raw format to the format required by the analysis software will be of great help. Furthermore, the standardization of formatting procedures, carried out by these functions, will increase the possibilities of result reproducibility, and at the same time, make the use of the software simpler and more efficient. The goal of the work will be to develop functions for converting files in raw format to the formats of the three most popular software currently among the scientific community: BLUPF90, ASREML, and Wombat. The developed package will consist of a library of file conversion functions (phenotypic and pedigree data) from a basic format to the format required by specific software for genetic and genomic studies. The functions will be developed using only the resources available in R and its packages. Initially, no function will be built in C++ (or other languages) and enclosed within an R function. Afterwards, the developed functions will be transformed into an R package. Functions will be developed for the chosen software to format phenotype and pedigree files for single and multi-trait analyses." (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)